Digitisation Disposal Policy Toolkit

advertisement
Digitisation Disposal Policy
Toolkit
Quality Assurance Guidance
Queensland State Archives
August 2014
Department of Science, Information Technology and Innovation
Document details
Security Classification
PUBLIC
Date of review of security classification
Authority
Author
August 2014
Document Status
Version
Final Version
Queensland State Archives
Queensland State Archives
Version 1.1
Contact for enquiries
All enquiries regarding this document should be directed in the first instance to:
Manager, Agency Services
Queensland State Archives
07 3037 6630
rkqueries@archives.qld.gov.au
Copyright
Digitisation Disposal Policy Toolkit – Quality Assurance Guidance
Copyright © The State of Queensland (Department of Public Works) 2010
Licence
Digitisation Disposal Policy Toolkit – Quality Assurance Guidance by Queensland State Archives is
licensed under a Creative Commons Attribution 2.5 Australia Licence. To view a copy of this
licence, please visit http://creativecommons.org/licenses/by/2.5/au/.
Information security
This document has been security classified using the Queensland Government Information
Security Classification Framework (QGISCF) as PUBLIC and will be managed according to the
requirements of the QGISCF.
Page 2 of 14
Digitisation Disposal Policy Toolkit – Quality Assurance Guidance
Table of contents
1 Introduction ..................................................................................................................... 4
1.1
Background ............................................................................................................ 4
1.2
Purpose................................................................................................................... 4
1.3
Audience ................................................................................................................. 4
1.4
Authority ................................................................................................................. 4
1.5
Scope ...................................................................................................................... 5
1.6
Definitions .............................................................................................................. 5
2 Developing and implementing quality assurance ......................................................... 5
2.1
Scanning equipment .............................................................................................. 6
2.2
Creation of the digitised record ............................................................................ 7
2.3
Capture of metadata............................................................................................. 10
3
Retention of original paper records .................................................................... 11
More Information................................................................................................................ 11
Appendix A – Quality Control Checklist ........................................................................... 12
Page 3 of 14
Department of Science, Information Technology and Innovation
1 Introduction
1.1
Background
When planning and implementing a digitisation project or business processes, quality assurance
procedures and guidelines are important to ensure the digitised records meet the requirements of
their intended use.
Effective quality assurance is critical in the event the original paper record is lawfully disposed of,
as the digitised record becomes the enduring evidence of the business activity, and without strong
quality controls this evidence is at risk of being inaccurate, incomplete and illegible.
Queensland State Archives’ Digitisation Disposal Policy outlines the conditions and requirements
Queensland public authorities must meet if they wish to destroy original paper records after they
have been digitised.
Principle 3 of this Policy requires public authorities to put in place trusted systems and processes
for the capture and management of digitised records. To be compliant with this principle public
authorities must (amongst other requirements) have quality assurance procedures in place that
include:

The timing of equipment tests and equipment calibration

Procedures for checking output, such as what proportion of the digital reproductions will be
subject to visual inspection and how long the original records need to be retained after
digitisation to ensure that quality checking processes can be undertaken

Procedures for re-imaging if quality standards are not met, and

Roles and responsibilities for checking and approving output.
Robust quality assurance procedures are important because in the event of evidential challenge,
public authorities may need to demonstrate in court that trusted systems and processes were
working as they should be, on the day the digitised record was created.
1.2
Purpose
This document provides guidance for Queensland public authorities on the development and
implementation of quality assurance procedures for digitisation projects and operations.
It has been developed to assist public authorities to meet the minimum requirements of Principle 3
of the Digitisation Disposal Policy.
1.3
Audience
The primary audience for this document is staff responsible for planning or implementing
digitisation projects and business processes.
1.4
Authority
The State Archivist has issued this policy in accordance with section 25(1)(f) of the Public Records
Act 2002 (the Act). Queensland State Archives is responsible for the provision of policy relating to
Page 4 of 14
Digitisation Disposal Policy Toolkit – Quality Assurance Guidance
a wide range of strategic information management and recordkeeping issues for Queensland
public authorities. This policy forms one part of a wider policy framework that aims to promote best
practice recordkeeping and information management in Queensland public authorities.
Under section 7 of the Act, the Chief Executive Officer of a public authority is responsible for
ensuring that the authority makes and keeps full and accurate records of its activities, and has
regard to the policies, standards and guidelines issued by the State Archivist.
1.5
Scope
This document forms part of the Digitisation Disposal Policy Toolkit. It is intended to be used in
conjunction with the other Toolkit items.
It focuses on approaches for ensuring the quality of the creation of digitised records through
scanning processes.
This document does not cover the quality control and assurance processes undertaken for specific
project management purposes, or for ensuring that the digitised record remains legible and
accessible over time.
1.6
Definitions
Digitisation-related terms and broader records and information management-specific terms are
defined in the Glossary of Archival and Recordkeeping Terms available from Queensland State
Archives’ website.
2 Developing and implementing quality assurance
Quality assurance is a critical component of digitisation activity. Quality assurance is not simply a
check on the output of digitisation, but a process that should be built into and maintained in the
ongoing operation of the digitisation work.
The International Standard Organisation’s ISO 9000 series on quality management provides
guidance on introducing a quality assurance system within an organisation. This includes
identifying the quality measures and processes, ensuring staff are adequately educated and the
monitoring, reviewing and improvement of the processes.
Specific controls that should be established and maintained for any digitisation program include
assessment of the quality of:

Scanning equipment

Business process of creating the digitised image

Metadata.
These quality controls are further explored in this document.
All quality assurance procedures should be documented and approved by senior management. All
quality control data (such as logs, reports, decisions) should also be captured in an agency’s
recordkeeping system. This becomes an integral part of the image metadata and may be used to
demonstrate authenticity of the digitised images and inform future preservation decisions.
Page 5 of 14
Department of Science, Information Technology and Innovation
It is important to agree on and test quality measures before image capture commences, to ensure
that they can be implemented and produce acceptable results. It is also important to undertake
periodic revisions of the quality measures so that they remain relevant to the intended purpose of
the records, and reflect emerging technology, legislation, and industry trends.
Quality assurance requires the perspectives of different stakeholders and therefore a range of staff
should be consulted to determine the appropriate quality controls and measures. It is also
important to identify and clearly articulate the organisational roles and responsibilities for quality
assurance, including the frequency of these duties, and ensure all staff are adequately trained in
both their operational and quality assurance responsibilities. Where digitisation has been
outsourced, a public authority should assign roles and responsibilities for ensuring quality
standards are monitored and met throughout the duration of the outsourced arrangement and
these responsibilities are clearly articulated in contractual agreements.
A checklist of key quality assurance questions, which may be used when establishing or reviewing
digitisation activity, is provided in Appendix A.
2.1
Scanning equipment
Digitisation relies on regularly maintained and correctly calibrated hardware and software to
produce high quality images that meet quality baselines.
To establish acceptable levels of quality for digital image capture, the scanning hardware system
should be tested by the use of scanner test targets or charts (see examples in Figure 1). These
contain a wide range of material which provide the ability to judge output in carefully measured
increments for such aspects as resolution, text, fonts, line widths, colour, tonal range, handwriting
and halftone.
Figure 1 – Standard “targets” can be used to test the functionality of digitisation equipment
Photocopies of these test targets should not be used for calibration purposes as the process of
copying denigrates the quality of the material being used to benchmark the results.
To undertake calibration testing, the test target is scanned and the quality of the image is checked
against the benchmark settings. To establish a benchmark, the test target should be scanned at a
high resolution, and in full scale view. The technical settings that allow an appropriate level of
legibility, clarity, and range of tones and colours of the digitised image should be assessed and
documented. The International Standard – ISO 12653 - Electronic imaging: Test target for the
black-and-white scanning of office documents Parts 1 and 2 contains guidance on evaluating the
output quality of a black-and-white scanning system for office documents, against a specified
target document. ISO 12641 - Graphic technology - Prepress digital data exchange - Colour
Page 6 of 14
Digitisation Disposal Policy Toolkit – Quality Assurance Guidance
targets for input scanner calibration provides advice related to evaluating the output quality of
colour scanning systems.
If the calibration test results are outside of predefined bounds, then remedial action should take
place with the calibration process repeated until the parameters are within limits.
The frequency of the calibration testing will depend on the volume of use. Some equipment may
not have any calibration settings that are user-adjustable, and may only need calibration at
servicing or maintenance periods.
Exact parameters and suggested intervals for calibration can be determined with input from an
agency’s hardware and software suppliers, and should be documented with other quality controls.
2.2
Creation of the digitised record
Almost all of the individual steps involved in converting a paper record into a readable and
accessible digital image can have quality assessments placed against them. The key is to
understand how the work is performed and identify key points at which quality checks should be
made.
Checking the quality of the output of a digitisation process entails consideration of a range of
issues that include:
2.2.1
The extent of the quality checking
Quality of digitised image
Baselines for acceptable and unacceptable characteristics of the digitised image must be
established so that a consistent level of quality can be maintained. These may be general, perhaps
simply requiring that each digital file be visually compared to the original paper record, or complex,
involving quantitative analysis of digital images using computer equipment to ensure that the
properties of a digital file meet accepted international standards.
The complexity and detail of quality baselines will depend on the project’s aims and the nature of
records involved. Strict and detailed quality control should be applied to digital images if the project
intends to destroy the original paper records, convert an important collection for long-term access,
or make high quality reproductions of a paper document.
The quality of images can be evaluated using software to examine technical aspects of images.
For example, noise in images is caused by random pixel fluctuations, and may make images
appear grainy. Software can be used to measure the level of noise in images, to check that it is
minimised to an acceptable level.
Some digitisation processing and management software may have the ability to modify the
appearance of a digitised record by adding information such as the date or organisation name.
Two such techniques are watermarking and fingerprinting. Watermarking is the inclusion of static
information on an image at time of storage, perhaps the name of the organisation and date of
capture. Fingerprinting typically includes information generated when the image is accessed, such
as login name of the end user and date / time information.
While this information may be useful, and the inclusion of it as part of the image convenient, these
modified images are no longer a true and accurate copy of the original paper records. This is
especially relevant where added information, such as a large watermark through the text, makes
Page 7 of 14
Department of Science, Information Technology and Innovation
the content of the record difficult to read. Public authorities should instead retain the digitised
record as an unmodified representation of the original paper record and capture this additional
information as metadata rather than as part of the image.
Some quality control relies on human judgement. Human judgement is often subjective and
therefore results of visual inspections may vary from person to person. If a number of staff are
responsible for visual inspections, training should be provided to communicate qualitative
information effectively and additional quality checking performed by supervisory staff to help
ensure a greater level of consistency.
If Optical Character Recognition (OCR) is used where the text depicted in a scanned paper
document can be extracted as a text file or word processor document. OCR software is required to
recognise the text contained in the image and usually provides search and export capabilities.
OCR is rarely a fully automated process and may require operator intervention to assist in
obtaining an accurate transcription of the scanned record’s text. Documents containing
handwriting, serif fonts, halftones, and background text or images or those that are damaged or
dirty may not be suited to the OCR process.
Public authorities may choose to use techniques to routinely make the digitised image more
accurately resemble the original, for example, ‘sharpening’ and/or ‘clipping’ of highlights or
shadows, ‘blurring’ to eliminate scratches, ‘spotting’ or ‘de-speckling’ to touch up specific areas of a
digital image. Some software may automatically correct imperfections and the extent of these
processes can be set through tolerance levels. Where touch-up occurs, these processes should be
assessed to make sure information is not lost (for example, if the tolerances are set too high, the
dots above the letter ‘i' may be removed). Processes employed should be documented so as to
help ensure the authenticity and completeness of the records is not at risk of being challenged.
Aspects against which the digitised output could be inspected and checked include:1

Has the smallest detail been legibly captured? (e.g. smallest type size for text; clarity of
punctuation marks, including decimal points)

Are all details complete? (e.g. acceptability of broken characters, missing segments of lines,
missing information at the edges of the image area, images cropped or incomplete)

Do the dimensions accurately compare with the original?

Has scanner-generated speckle been removed? (i.e. speckle not present on the original)

Do the colours accurately compare with the original? (e.g. density of solid black areas – too
light? too dark?; colour fidelity)

Is the sharpness of the image comparable to the original? (e.g. lack of sharpness or too much
sharpening; unnatural appearance and halos around dark edges)

Where optical character recognition2 (OCR) is used, is the captured text accurate?
1
Adapted from the Archives New Zealand Digitisation Standard,
OCR is a process in which printed characters are scanned, recognised and coded – The Australian Concise Oxford
Dictionary, Third Edition, 1997.
2
Page 8 of 14
Digitisation Disposal Policy Toolkit – Quality Assurance Guidance
Completeness of digitisation
To ensure all of the required paper records are digitised, checks should be conducted on the
completeness of the work such as validating the number of input paper documents against the
number of digitised images created, and checking that for multi-page items, the number of pages
within a document accurately reflect the input bundle, and are structured and arranged in the
correct order.
2.2.2
The amount to check
An important aspect of quality control is determining the proportion of digitised images that will be
checked. All digital images can be tested, or a representative sample of digitised documents may
be selected. Testing all digital images will ensure that all images meet the minimum required
quality levels, but can be very time and resource intensive. If, however, only a sample is tested,
care must be taken to ensure that the sample is representative of the range of records digitised
and include examples of source documents of which quality is poor.
In some cases, such as following equipment repairs, or if using new staff or outsourcing vendors,
each image may be checked until there is confidence that the standard is being met. However,
testing only a sample of digital images gives a lower degree of certainty that all images have met
quality baselines.
2.2.3
The location of quality checking
Quality baselines should be established for the output device that a digital record is intended for
and be verified using that device3. If a digital image is intended for printing, then the digital file
should be printed and checked against the quality baselines for printed images. If a digital image is
intended for display on a computer monitor, quality baselines should be verified on a computer
monitor.
A controlled environment is required to consistently apply quality baselines. In an uncontrolled
environment, for example with excessive glare, reflections or using an improperly set up computer
system, a high quality image may be incorrectly deemed to have not met quality baselines.4
The area of an image that can be seen on a monitor depends on the image pixel dimensions and
the desktop resolution. The area of an image displayed can be increased by increasing the screen
resolution or by decreasing the image resolution. Multiple images may be viewed on the screen at
one time, however to ensure details have been captured appropriately a number of the images
should be viewed at 100% or greater magnification.
2.2.4
Re-Imaging
Where digitised images do not meet the documented quality standards, a public authority will need
to re-scan the image.
3
Frey F. Guides to Quality in Visual Resource Imaging: 4. Measuring Quality of Digital Masters. 2000. Council on Library
and Information Resources. Accessed February 2010 at: http://www.diglib.org/pubs/dlf091/dlf091.htm.
4
Moving Theory into Practice: Digital Imaging Tutorial. 2003. Cornell University Library/Research Department. Accessed
February 2010 at: http://www.library.cornell.edu/preservation/tutorial/quality/quality-02.html.
Page 9 of 14
Department of Science, Information Technology and Innovation
For instances when quality standards are not met during a randomly selected sampling exercise, a
procedure should be in place that provides direction on the need to re-inspect and re-scan the
remaining output.
Example approaches:
If more than 1% of the total number of images and associated metadata examined in a randomly
selected sampling are found to be defective, the entire output since the last quality check is reinspected. Any specific errors found in the random sampling and any additional errors found in the
re-inspection are corrected.
If less than 1% of the batch is found to be defective, then only the specific defective images and
metadata that are found are redone.5
Or, where a problem is located, digitised outputs on either side of the problem image can be
assessed and re-scanned until the issue has been resolved.
As an aid to reduce the need for re-imaging, Appendix 7 of the Archives New Zealand Digitisation
Standard highlights common implementation and process or operator faults to avoid which may be
useful to inform the planning of digitisation and quality checking processes.
2.3
Capture of metadata6
Most scanners will automatically capture technical metadata such as the camera used, date
scanned, resolution and bit-depth. The type of metadata captured may be able to be configured
through the scanner’s settings. The Digitisation Disposal Policy Toolkit: Metadata Guidance
provides further information about the capture of metadata.
An important component of quality assurance is ensuring that adequate and accurate metadata is
captured. Checks should be determined, documented and implemented to assess the quality of
metadata that are both manually entered and automatically generated. This should encompass:

Adherence to the Queensland Recordkeeping Metadata Standard and any additional
metadata standards set by the public authority or the requirements of the digitisation project

Relevancy and accuracy of metadata

Accuracy of grammar, spelling and punctuation, especially for manually-keyed data

Consistency in the creation of metadata and in interpretation of the metadata

Synchronisation of metadata stored in more than one location – e.g. information related to the
image might be stored in the TIFF header, the management system, and other databases and
this should always be consistent, and

Completeness of metadata – that all mandatory fields are complete.
Procedures should be established by the public authority to address poor metadata capture
revealed through the quality checks.
5
Adapted from the Archives New Zealand Digitisation Standard
6
This information is adapted from the Archives New Zealand Digitisation Standard.
Page 10 of 14
Digitisation Disposal Policy Toolkit – Quality Assurance Guidance
It may be useful to evaluate, over time, the usefulness of the metadata being collected and, if
appropriate, make amendments to the processes and systems to ensure required metadata is
being captured.
3 Retention of original paper records
Under the Digitisation Disposal Policy, original paper records must be retained until the quality of
the digitised image has been verified. How long after quality checking the originals should be
retained is a decision for a public authority and should be assessed during the planning phase of
digitisation activity and articulated in an internal policy statement. This period should be based on
an assessment of the:

Level of assurance that a full and accurate record has been created

Level of assurance that the digitised image is being well managed in a recordkeeping system

Level of assurance that the authenticity is being maintained

Robustness of digitisation processes, including quality assurance processes

Need for access to the original for other purposes such as legal proceedings.
Once the minimum period set by the public authority for retaining the original paper records has
lapsed, and the quality of the digitised record has been verified, original paper records may be
disposed of under the General Retention and Disposal Schedule for Original Paper Records that
have been Digitised (QDAN 656 v.2).
More Information
For more detailed guidance on the management of public records visit the Queensland State
Archives’ website at www.archives.qld.gov.au or contact us on: telephone: (07) 3037 6630 or
email: rkqueries@archives.qld.gov.au
Page 11 of 14
Department of Science, Information Technology and Innovation
Appendix A – Quality Control Checklist
The following checklist is intended to provide a summary of issues to be considered when
developing quality control processes. Public authorities are not required to complete the Checklist7.
1. Establishing quality assurance processes
1.1
1.2
Have quality assurance procedures been developed, approved by senior management
and communicated to stakeholders that address the:

Scope and extent of the quality controls and assurance processes

Measures of quality, which reflect the perspectives of different stakeholders

Planned frequency of quality controls and assurance processes

Roles and responsibilities of stakeholders?
Has a baseline been established for:

calibrating equipment

the output viewing device

the specification and quality of captured metadata

quality of the digitised record?
1.3
Has the frequency of undertaking calibration testing been determined and
documented?
1.4
Has the quantity of images and metadata to be checked and the frequency of this,
been determined and documented?
1.5
Does the metadata specification for the digitisation activity adhere to the Queensland
Recordkeeping Metadata Standard?
1.6
Have procedures been established and implemented to address poor metadata
capture revealed through the quality checks?
1.7
Are the scope and extent of the use of touch-up techniques documented?
Have checks been conducted to ensure any touch-up techniques do not result in the
loss of information?
1.8
Do processes consider the need to enlarge images in order to assess the quality of
the digitised image?
1.9
Has a procedure been developed and established to enable the re-imaging of records
7










However, the Chief Executive Officer or authorised delegate of the public authority must complete, sign and retain as a
permanent record a Compliance Declaration (Appendix 1 of the Digitisation Disposal Policy) to demonstrate compliance
with the minimum requirements of the Digitisation Disposal Policy.
Page 12 of 14
Digitisation Disposal Policy Toolkit – Quality Assurance Guidance
where they do not meet quality standards?
1.10
Are staff with responsibility for quality assurance sufficiently trained to be able to
undertake their duties?
1.11
Is responsibility for signing-off on quality checks assigned at an appropriate level?
1.12
Has a risk analysis been documented and approved by senior management to
determine the period for retaining the original paper records after digitisation has
occurred?



Does this include consideration of the:

Level of assurance that a full and accurate record has been created

Level of assurance that the digitised image is being well managed in a
recordkeeping system

Level of assurance that the authenticity is being maintained

Robustness of digitisation processes, including quality assurance processes

The need for access to the original for other purposes such as legal
proceedings.
1.13
Have the quality measures been tested before image capture commences, to ensure
they can be implemented and produce acceptable results?
1.14
Where digitisation is outsourced, are contractual agreements in place regarding quality
assurance?


2. Reviewing quality assurance processes

2.1
Are quality measures and procedures implemented and regularly reviewed?
2.2
Is quality control data (such as logs, reports, decisions) documented and captured in a
recordkeeping system and managed as part of the digitised images’ metadata?
2.3
Are the calibration tests being undertaken in accordance with the determined
frequency?
2.4
Are tests carried out in line with ISO12653 and ISO12654?

2.5
Are original (that is, not photocopied) test targets used?

2.6
Is equipment regularly serviced?

2.7
Are a proportion of digitised images checked for aspects such as:


Has the smallest detail been legibly captured? (e.g. smallest type size for text;
clarity of punctuation marks, including decimal points)

Are all details complete? (e.g. acceptability of broken characters, missing
segments of lines, missing information at the edges of the image area, images
cropped or incomplete)

Do the dimensions accurately compare with the original?

Has scanner-generated speckle been removed? (i.e. speckle not present on


Page 13 of 14
Department of Science, Information Technology and Innovation
the original);

Do the colours accurately compare with the original? (e.g. density of solid
black areas - too light? too dark?; colour fidelity)

Is the sharpness of the image comparable to the original? (e.g. lack of
sharpness or too much sharpening; unnatural appearance and halos around
dark edges)

Where Optical Character Recognition is used, is the captured text accurate?

2.8
Are quality checks conducted to ensure all of the planned records and all of the pages
in multi-page items are digitised?
2.9
Do the agreed samples for quality checking represent the range and quality of records
digitised?
2.10
Does the capture of metadata reflect the specification set by the public authority and
still comply with the Queensland Recordkeeping Metadata Standard?
2.11
Are the captured metadata relevant and accurate and linked to correct records and
files, especially for manually-keyed data, e.g. appropriate security level applied,
accurate creation date captured, correct document author and scanner operator
identified?
2.12
Can metadata be interpreted consistently?

2.13
Are metadata that are stored in more than one location synchronised?

2.14
Are all mandatory metadata fields complete?

2.15
Has the usefulness of the metadata being collected been assessed over time?

2.16
Are digitised records being re-imaged and metadata re-captured in line with
procedures when they do not meet quality standards?
2.17
Are staff aware of their roles and responsibilities for checking output and trained to
ensure any subjective visual inspection tests are consistent?
2.18
Is some quality checking performed by supervisory staff to help ensure a greater level
of consistency?
2.19
Are original paper records being disposed of under the General Retention and
Disposal Schedule for Original Paper Records that have been Digitised?
Page 14 of 14







Download