MEMORY OF THE WORLD REGISTER

advertisement
MEMORY OF THE WORLD REGISTER
PANDORA, Australia’s Web Archive
REF N° 2004-28
PART A – ESSENTIAL INFORMATION
1
1.1
SUMMARY
Nature of the nomination
PANDORA, Australia’s Web Archive, is a collection of copies of significant Australian online
publications and web sites issued on the Internet. The National Library of Australia and its
partners1 are building the Archive to ensure long-term access to significant Australian
documentary heritage that is published online.
1.2
The Archive is stored and managed by the Library and can be searched and accessed via the
Internet. It is currently growing at the rate of about 2,000 new titles per year and the titles that
are added to it are selected on the basis of stringent selection criteria.
1.3
PANDORA is the first example in the world of a publicly accessible archive of web resources,
based on well-established principles of collection development and access.
1.4
PANDORA is proposed for the Memory of the World Register to highlight that information in
digital formats is as important as any other to our cultural and documentary history and needs
to be preserved. A significant proportion of the world’s documentary heritage is now
published only online. If it is not collected and managed, it will be lost to future generations.
2
DETAILS OF THE NOMINATOR
2.1
Name (person or organisation)
Janice Lillian Fullerton
Director General
National Library of Australia
2.2
Relationship to the documentary heritage nominated
The nominator is the Director General of the National Library of Australia, which is the owner
and custodian of the Archive.
2.3
Contact person (s)
Pam Gatenby
Assistant Director General
Collections Management
2.4
Contact details (include address, phone, fax, email)
Address
National Library of Australia
Parkes Place
1
The PANDORA partners are the National Library of Australia, the Northern Territory Library and Information
Service, the State Library of Queensland, the State Library of New South Wales, the State Library of Victoria,
the State Library of South Australia, the State Library of Western Australia, ScreenSound Australia, the
Australian War Memorial, and the Australian Institute of Aboriginal and Torres Strait Islander Studies.
1
ACT 2600
Telephone 02 6262 1672
Fax 02 6273 2545
Email pgatenby@nla.gov.au
3
3.1
IDENTITY AND DESCRIPTION OF THE DOCUMENTARY HERITAGE
Name and identification details of the items being nominated
PANDORA, Australia’s Web Archive
http://pandora.nla.gov.au/index.html
The owner and custodian of PANDORA is the National Library of Australia. The
documentary heritage is located within the Library at
Parkes Place
Canberra
ACT 2600
Australia
3.2
Description and inventory, including cataloguing/guide or similar access information
3.2.1
PANDORA, Australia’s Web Archive, is accessible at http://pandora.nla.gov.au/index.html.
A full browsable list of the 6,000 titles currently in the Archive is available on this home page.
All titles contained in the Archive are individually catalogued according to library conventions
in the National Library’s online catalogue, as well as in the National Bibliographic Database
(NBD).
3.2.2
The National Library considers the nominated heritage to be the contents of the PANDORA
Archive, not the policy documents and technical infrastructure, though they have played a
significant role in determining the type of archive it is and its scope and evolution.
3.2.3
The National Library of Australia is committed to preserving the contents of the Archive in
perpetuity.
3.2.4
It is not possible to preserve the technical infrastructure in operational form. The Library
already keeps an historic register of the source code. It also keeps a preservation copy of the
display versions of each title and instance2 so that the Archive could be reconstituted in its
present form if ever required in the future. Should PANDORA be accepted for inclusion on
the World Register, the National Library would be happy to document the existing contents of
the Archive and its interface by undertaking to do the following :



Keep a snapshot of the interface ;
Copy the PANDAS3 database on a specified date of relevance to this nomination, so
that titles and instances that are part of the Archive at this time will be recorded ;
Keep copies of the selection guidelines that have shaped the Archive available for
consultation in the future.
An ‘instance’ is a single gathering of a title. It includes the gathering of a monograph that will be archived only
once, the first gathering of a serial title or integrating title (for example a web site that changes over time), and
all subsequent gatherings.
3
PANDAS is the PANDORA Digital Archive System developed by the National Library of Australia to support
the collection, management, provision of access to and preservation of online publications and web sites.
2
2
Bibliographic and registration details
3.2.5
PANDORA : Australia’s Web Archive [electronic resource] / National Library of Australia
and partners. Canberra : National Library of Australia, 1996 -.
PANDORA is an online collection of significant Australian publications and web sites,
containing 6,024 titles as of 26 May 2004.
The Archive is accessible at http://pandora.nla.gov.au/index.html
Summary of its provenance (for example, how and when was the material acquired and
integrated into the holdings of the institution)
3.2.6
In 1995 the National Library identified the issue of the growing amount of Australian
information published in online format only as a matter needing attention. The Library
accepted that it had responsibility to collect and preserve Australian publications, regardless of
format.
3.2.7
In 1996 the Library began to develop selection guidelines for this category of material. The
Australian Electronic Unit was established to select online publications according to the
guidelines, to negotiate with publishers for the right to archive them, and to catalogue them
onto the National Bibliographic Database.
3.2.8
Work proceeded on two levels at the same time - developing policy and theoretical models for
the work and undertaking practical experiments in web archiving, storage and access using
freely available software. The first two titles were downloaded in October 1996. By June
1997 the Archive contained 31 titles.
3.2.9
By 1998 policy, procedures and infrastructure were sufficiently developed to invite the State
libraries to become partners and in August the State Library of Victoria became the first
partner. By March 2004 five State libraries, the Northern Territory Library and Information
Service, ScreenSound Australia: the National Screen and Sound Archive, the Australian War
Memorial, and the Australian Institute for Aboriginal and Torres Strait Islander Studies had
become partners, ten contributing agencies in all.
3.2.10 To support the acquisition and management of increasing volumes of data, as well as to
support more efficient distributed archive building among partners, the Library developed the
PANDORA Digital Archiving System (PANDAS), the first release of which took place in
June 2001, with version 2 being released in August 2002. Further development of the
software has commenced in the first quarter of 2004.
3.2.11 The National Library and its partners





select online publications for archiving;
register them in PANDAS;
negotiate a copyright licence with publishers (including the right to copy publications and
web sites into the Archive and provide access in perpetuity to them);
carry out quality assessment to ensure that all the functionality of a web publication has
been captured in the archived copy; and
create a catalogue record for them for inclusion in the National Bibliographic Database
and the local catalogues of library partners.
3.2.12 The Archive is held centrally at the National Library, which takes responsibility for
maintaining the Archive, backing it up according to standard Information Technology
management practices, and taking preservation action over time, as required.
3
Analysis or assessment of physical state and condition, such as description of storage
arrangements, conservation diagnosis, etc.
3.2.13 PANDORA is a collection of computer files, which constitute copies of selected online
publications and web sites, which are issued on the Internet. A title in the Archive may
consist of a single file, such as a text document in Portable Document Format (PDF), for
example, Annual Report to the NSW Environment Protection Agency <
http://pandora.nla.gov.au/tep/42658>. Or it may be a complex web object, such as a large web
site, consisting of thousands of files in a variety of formats, such as text, sound, image or
video, for example, the Arafura Games <http://nla.gov.au/nla.arc-14228>.
3.2.14 PANDORA contributors preserve the ‘look and feel’ (appearance and functionality) of a
publication or web site, as well as its contents, to the greatest extent possible. With the
publisher’s permission, a harvesting robot is sent to the publisher’s site to harvest the
publication or web site and bring a copy of it back to working space within PANDAS. Staff
of the National Library and its partners then check this copy for completeness and
functionality before consigning it to the Archive for public access.
3.2.15 At least three copies of each archived item are made, one for display and two for preservation
purposes. The display copy is stored on a dedicated server at the National Library. The two
preservation copies are stored in the Library’s Digital Object Storage System, where all the
Library’s digital collections are stored in optimum conditions of security and to facilitate
appropriate preservation activity as required in the future.
Visual documentation
3.2.16 The live site is available at <http://pandora.nla.gov.au/index.html>.
Bibliography
3.2.17 Beagrie, Neil. National Digital Preservation Initiatives: An Overview of Developments in
Australia, France, the Netherlands and the United Kingdom and of Related International
Activity. Washington, D.C.: Council on Library and Information Resources and Library of
Congress, 2003.
<http://www.clir.org/pubs/reports/pub116/pub116.pdf>
This report provides an overview of selected national and multinational initiatives in digital
preservation occurring outside North America, including PANDORA.
3.2.18 Cathro, Warwick, Colin Webb, and Julie Whiting, Archiving the Web: The PANDORA
Archive at the National Library of Australia. Canberra: National Library of Australia, 2001
<http://www.nla.gov.au/nla/staffpaper/2001/cathro3.html>
3.2.19 Day, Michael. Collecting and Preserving the World Wide Web: A Feasibility Study
Undertaken for the JISC and Wellcome Trust. Bath: UKOLN, University of Bath, 2003.
<http://www.jisc.ac.uk/uploaded_documents/archiving_feasibility.pdf>
This study examines PANDORA in the context of digital archiving world wide and
recommends that the Wellcome Trust establish an archive using the PANDORA methodology
and the PANDAS software for this archiving activity.
Referees
3.2.20 Professor Ross Harvey
Library and Information Management
4
School of Information Studies
Charles Sturt University
Locked Bag 675
Wagga Wagga NSW 2678
Phone: +61 2 6933 2369
Email: rossharvey@csu.edu.au
Professor Harvey has a long-standing interest in preservation management. Since 1988 he has
written over 25 articles and 4 books and has accepted numerous invitations to contribute to
workshops and conferences in this field. He is currently working on a book entitled Preserving
Digital Objects: An Australian Perspective, and in preparation for this spent some time at the
National Library investigating its digital preservation program and PANDORA. Professor
Harvey is leading a group developing a register of lost and missing documentary heritage for
the Australian progam of UNESCO's Memory of the World Program, and is presenting papers
on preservation at the ALIA (Australian Library and Information Association) and LIANZA
(Library and Information Association of New Zealand Aotearoa) conferences in September
2004.
3.2.21 Mr Neil Beagrie
British Library-JISC Partnership Manager
British Library
96 Euston Road
London NW1 2DB
Phone: + 44 709 204 8179
Email: Neil.Beagrie@bl.uk
Neil Beagrie has been extensively involved in the field of digital preservation in various
positions he has held in the U.K., including assistant director of the Arts and Humanities Data
Service (AHDS) and program director for digital preservation in the Joint Information
Systems Committee (JISC). He was co-author of Preservation Management of Digital
Materials, a study published by the British Library in 2001, and was the author of National
Digital Preservation Initiatives: An Overview of Developments in Australia, France, the
Netherlands, and the United Kingdom and of Related International Activity, a study published
by the Library of Congress in 2003. The latter study investigated the National Library's
PANDORA Archive and set it in the context of other key digital archiving programs taking
place world wide. He now works at the British Library as British Library-JISC Partnership
Manager.
3.2.22 Laura Campbell
Associate Librarian for Strategic Initiatives
Library of Congress
101 Independence Ave SE
Washington DC 20540
USA
Phone: +1 202 707 7849
Email: lcam@loc.gov
Ms. Campbell joined the Library of Congress in 1992 as Director of Library Distribution
Services. She is currently Director of the National Digital Library Program (NDL). In this
position, Ms. Campbell managed the innovative American Memory Program, a cooperative
effort to digitize and make available online significant items pertaining to American culture
and history. Since 2000, Ms. Campbell has also held the position of Associate Librarian for
Strategic Initiatives. As such, she is responsible for the development of the National Digital
Information Infrastructure and Preservation Program (NDIIP) in collaboration with other
cultural and heritage institutions. NDIIP is a plan approved and funded by Congress to
5
establish a strategy for the Library of Congress in collaboration with other federal and
nonfederal entities, toInvestigations into the technical aspects of digital preservation have
comprised a major component of the project thus far, and eventually, the NDIIP will make
recommendations to Congress about the best options for a long-term national preservation
strategy. Ms Campbell is therefore very well aware of the importance of such work, the
requirements for trusted national archives, and the challenges involved in implementing them.
4
JUSTIFICATION FOR INCLUSION/ ASSESSMENT AGAINST CRITERIA
Authenticity
Yes. Authenticity of a digital archive is quite a complex matter and involves a number of factors. Are
the individual items within the archive faithful copies of the original and does the archive as a
whole have integrity? Authenticity of a digital archive depends on effective, ongoing
management strategies to ensure that the archive and individual items in it are not accidentally
or deliberately changed.
Policies and procedures for authenticity at the title level
4.1.2
The National Library has a number of policies and procedures in place to keep the PANDORA
Archive authentic, especially its intellectual content.
4.1.3
Substantial effort is invested in ensuring the authenticity and integrity of each individual title.
This attention is possible with selective archives and is one of their major advantages. In
contrast, whole domain archives, such as the Internet Archive or the archive of the National
Library of Sweden, collect such a large volume of material that quality assurance cannot take
place. A significant proportion of titles in whole domain archives at the present time are
therefore incomplete in content or lacking in functionality.4
4.1.4
In copying a publication or Web site into the Archive, the policy of the PANDORA partners is
to maintain its ‘look and feel’, that is, its appearance and functionality, as well as contents, to
the fullest extent possible. In most cases this is achievable. In some cases, however, because
of the technical set up of a site or because of the limitations of current harvesting technology,
it is not possible to exactly reproduce a site. Some functionality may be missing, for instance,
the ability to conduct a search of an archive of back issues on the web site of an e-journal.
Functions that involve interaction with publishers’ sites have to be disabled.
4.1.5
The Library is constantly working to overcome technical limitations that prevent us from
archiving some publications or parts of them. A current research project to archive and
provide access to publications structured as databases is a good case in point.
4.1.6
Where it is not possible to replicate exactly the ‘look and feel’ and functionality of a title with
current technology, the Library takes a pragmatic view and argues that it is better to
compromise on these aspects for the benefit of being able to preserve the intellectual content.
It is essential that the intellectual content be preserved exactly as produced by the publisher.
All sites are quality checked and are accurate copies of the publishers’ sites at time of
archiving. Date of archiving of each instance is collected by the system and is clearly
indicated.
4
For further information about the respective advantages and disadvantages of selective and whole domain
harvesting, see section ...[comparison with other archives]
6
4.1.7
Preservation activity to keep titles accessible as hardware and software changes may also
necessitate some changes to ‘look and feel’ and functionality. Once again, preservation of the
intellectual content exactly as produced by the publisher will be of paramount concern.
Policies and procedures for authenticity at the Archive level
4.1.8
At the Archive level, a number of policies and practices are in place to preserve authenticity.
4.1.9
At least three copies of each instance of a title are kept:

The preservation master, which is derived from the raw output of harvested files and
associated work logs before human or machine interactions occurs. It is therefore an exact
copy of what is downloaded from the publisher’s site and will be kept in perpetuity.

The access master, which is the set of harvested files that have undergone quality assurance
procedures (modification of links, addition of missing images, etc.) to make the instance
suitable for public display. The access master will also be kept in perpetuity.

The access copy, which is served to users of the Archive.
4.1.10 The preservation and access masters are stored separately from the access copy on the
Library’s secure Digital Object Storage System (DOSS). Backup tapes of both the DOSS and
the PANDORA Archive server are maintained, including copies that are stored off-site.
4.1.11 These practices contribute to the authenticity of the Archive in the following ways:



Should the Archive fail or be destroyed or corrupted, back up copies would enable the
restoration of the Archive;
Should a hacker succeed in tampering with one of the copies of an instance (for explanation of
‘instance’ see footnote 2), it is highly unlikely that s/he would succeed in tampering with all
copies on all backup tapes, and the instance could be restored from the intact copies;
Should quality control or preservation activity change an instance to an unacceptable degree,
or alter the intellectual content at all, the preservation master can be returned to for an exact
replication of the title as it was on the publisher’s site.
Persistent identifiers
4.1.12 Another aspect of authenticity is persistence. Each item in the Archive, from title level right
down though instances and parts of instances to component files, has a unique persistent
identifier automatically assigned by the digital archiving system. This enables authors to cite
works and parts of works in the Archive using the appropriate persistent identifier. Readers
can return to the cited item in the Archive again and again, confident that it will remain there
persistently and that it will be the same.
World significance, uniqueness and irreplaceability
Uniqueness
4.2.1
The PANDORA Archive is unique. It is recognised by the library and archives sector as the
leader of its type (a selective archive) in the world. It is the only archive to be built
collaboratively. The two overseas studies cited in the bibliography (see paragraphs 3.2.17 &
3.2.19) indicate its importance in world terms and place it in context.
4.2.1.1 PANDORA is the first example in the world of a publicly accessible archive of web resources,
based on well-established principles of collection development and access, that has the
following characteristics:
7

It is selective and developed according to clearly defined and published selection
guidelines.

It has been more adventurous and innovative than other web archives in its approach to
collecting a wide range of formats and includes both static and dynamic5 publications
and web sites. It therefore represents a wide range of publication types and formats
employed by publishers and creators on the Web.

Each title that is added to the Archive is quality assessed to ensure completeness of
intellectual content and functionality.

Each title in the Archive is catalogued, with a record in the National Library’s and other
partners’ online catalogues, as well as in the National Bibliographic Database6. It
therefore conforms to the IFLA recommendation7 that online resources be included in
the national bibliography.

It provides networked access to researchers world-wide.

Up until now it has been the only web archive managed by a cultural heritage institution
that is built collaboratively with other partners.
Dynamic as well as static publications
4.2.1.2 PANDORA is unique. It is a selective archive that includes dynamic publications and web
sites as well as static publications. Other national archives, such as those of the National
Libraries of Canada, Denmark, and Japan, have been set up to collect online static
publications, that is, documents that are accessed by hypertext links only.
4.2.2
Irreplaceable.
For an estimated six per cent of the titles in the Archive the publisher’s site no longer exists on
the live Web and the title is therefore irreplaceable. As time passes, this percentage will
increase rapidly.
4.2.2.1 This does not, however, represent the full extent of the irreplaceable material in the Archive.
While live publishers’ sites remain for most of the titles in the Archive, many of these sites
have changed over time and PANDORA has captured this changing content through repeat
gatherings. While it is not possible to quantify the amount of material in this category which
is now unique to PANDORA, it is estimated to be a significant amount. Examples include:

Fineart Forum: Art + Technology Net News <http://pandora.nla.gov.au/tep/11009>. This
e-zine commenced publication on the Internet in 1987 before the creation of the World
Wide Web. From 1987 to 1995 its issues are in very plain text with no images. From
1996, however, it started to take advantage of the design and navigation features offered
A ‘dynamic’ web resource is defined as ‘ A web document that is created from a database in real-time or "on
the fly" at the same time it is being viewed, providing a continuous flow of new information and giving visitors a
new experience each time they visit the web site. This definition is given in www.about-the-web-.com: An
Internet Guide for Newcomers to the World Wide Web. It is available online at <about-theweb.com/shtml/glossary.shtml>. Consulted 30 June 2004
5
6
The National Bibliographic Database is a union catalogue of records of over 850 Australian libraries, access to
which is provided by the Kinetica service. It is available online at <http://www.nla.gov.au/kinetica/> .
Consulted 12 March 2004.
7
International Federation of Library Associations (IFLA). The final recommendations of the International
Conference on National Bibliographic Services, 1998. Available online at
<http://www.ifla.org/VI/3/icnbs/fina.htm>. Consulted 3 May 2004.
8
by HTML8. While it remains a static publication, it frequently changes design and the
Archive records the evolution of its design, as well as preserving its intellectual content
for posterity. See Appendices 4 & 5 for for images of early and late versions of this
publication.

Worlds of Sara Douglass <http://pandora.nla.gov.au/tep/10349>
The site was archived every year from 1998 to 2002, when it changed on a regular basis.

Prime Minister of Australia, John Howard <http://pandora.nla.gov.au/tep/10052>.
Repeated harvests of this site document change from 1998 to 2004 in the web presence of
a major political figure.
4.2.2.2 PANDORA documents the early years of publication on the Australian Internet. In doing so it
also documents the early years of the world Internet. Because it was the second national
archive to be established, because it is the only selective archive to actively collect dynamic
publications and web sites, and because it undertakes quality assurance of archived titles to
make sure they work as they are supposed to, it is likely that PANDORA is the only repository
for some types of early online publications and web sites that have now disappeared from the
Web.
4.2.3
Significance
PANDORA is significant in world terms. It contains Web sites and publications chosen
because of their social, political, cultural, religious, scientific and economic significance, an
increasing number of which are no longer available anywhere else.
4.2.3.1 For example, Sydney 2000: Official Site of the Sydney 2000 Olympic Games
<http://pandora.nla.gov.au/tep/10194> is one of the most heavily used titles in the Archive
and is becoming even more popular as the 2004 Olympics approach. It disappeared from
the live site soon after the close of the Games in 2000 and it was the first web site for an
Olympic games to have been captured. This site was captured a number of times to record
change leading up to the Games and then every day while the Games were underway.
4.2.3.1
PANDORA is historically significant because :

It includes publications and web sites that once existed on the Internet but have now
disappeared. The only way now to refer to these is via the PANDORA Archive. In
addition, it includes significant publications that are only in online form and which will
in time disappear from the publishers’ web sites.

PANDORA was established in 19969, just three years after the creation of the World
Wide Web in 1993. It therefore provides a record of the early years of this new and
revolutionary publication and communication medium, which is freely available to
researchers.

Its objective is to maintain access in perpetuity to resources in it by taking appropriate
preservation action. It is not just for short-term access purposes.

It recognises that online or digital publishing forms an important part of a nation’s
heritage.
‘HTML’ stands for HyperText Markup Language and is the major language of the Internet’s World Wide Web.
Web sites and web pages are written in HTML. This definition is given in HTML : An Interactive Tutorial for
Beginners. It is available online at <http://www.davesite.com/webstation/html>. Consulted 30 June 2004.
9
Experimental archiving for PANDORA commenced in 1996, with regular archiving established by mid 1997.
The Archive contains files published electronically as early as 1987.
8
9

PANDORA has been influential in the development of web archiving. It demonstrates
that it is possible, with firmly based business models and objectives that govern
operations, for national libraries and other cultural heritage institutions to treat web
resources in a way that is consistent with their remit to collect, document and provide
access to cultural heritage.
4.2.3.2 PANDORA has aesthetic significance because it preserves the appearance and functionality
(the ‘look and feel’) of publications and web sites, as well as their intellectual content, and the
evolution in presentation and format of items mounted on the Web.
4.2.3.3 PANDORA has social and spiritual significance. It includes web sites created by many
different groups and communities to express identity, communicate views, beliefs and
concerns, and to celebrate their contribution to the broader society. These include web sites
by and for indigenous peoples and multi-cultural communities from around the world.
PANDORA as a model for other archives
4.2.4
PANDORA is the first of its kind and other national and institutional archives are emulating it.
For instance, the Library of Congress has modelled the user interface for its MINERVA
Archive on PANDORA.
4.2.5
A feasibility study by the Wellcome Trust, in seeking a solution for the collection and
preservation of web sites of relevance to medical research, recommended that the Trust
‘Establish a pilot medical Web archiving project using the selective approach as
pioneered by the National Library of Australia…The pilot should consider using the
NLA’s PANDAS software for this archiving activity. This pilot could be run
independently or as part of a wider collaborative project with other partners.’10
4.2.6
The Wellcome Trust has now joined with other collecting institutions in the United Kingdom,
including the British Library, the National Archives, and the Scottish and Welsh National
Libraries, to form the UK Web Archiving Consortium. This Consortium is planning a
selective archive modelled in part on PANDORA and has contracted with the National Library
of Australia to use the PANDAS software for its web archiving program. Other national
libraries have also evaluated the software, or plan to, with a view to developing archives based
on similar principles using the software.
Online ‘visits’ to PANDORA
4.2.7
The majority of online visitors to PANDORA come from countries other than Australia. In
May 2004, of a total of 161,299 visits to the Archive, only 49, 240 were from Australia. Apart
from 30,209 visits from ‘region unspecified’, the remainder (81,850) were from elsewhere in
the world, including North America, Europe, Asia and the Pacific Islands. As titles in the
Archive inevitably disappear from their live sites, the value of the Archive to researchers will
increase.
Scope of PANDORA
10
Day, Michael. Collecting and Preserving the World Wide Web: A Feasibility Study Undertaken for the JISC
and Wellcome Trust. Bath: UKOLN, University of Bath, 2003. Page 3. Available online at
<http://www.jisc.ac.uk/uploaded_documents/archiving_feasibility.pdf>
10
4.2.8
Each of the PANDORA partners selects titles for the Archive according to stringent selection
guidelines that are published on the PANDORA web site. The National and State libraries
archive those publications and web sites relating to the published output of their jurisdictions.
ScreenSound Australia takes responsibility for sites relating to music and film; the Australian
War Memorial archives sites relating to Australian military history; and the newest partner, the
Australian Institute for Aboriginal and Torres Strait Islander Studies archives the publications
and web sites of our Indigenous peoples. The National Library of Australia’s selection
guidelines are available at http://pandora.nla.gov.au/selectionguidelines.html.
4.2.9
The PANDORA Archive contains publications and web sites carefully selected for their
significance and long-term research value. It comprises an estimated less than one per cent of
the Australian web domain.
4.2.10 It contains a wide range of publications and Web sites. High priority is placed on collecting
government publications and academic e-journals. In addition there are many other types of
sites. The following are just a few examples in a few categories:
Cultural activity
Bangarra Dance Theatre < http://pandora.nla.gov.au/tep/14134>
Sydney Film Festival < http://pandora.nla.gov.au/tep/25307>
Australian Girls’ Choir < http://pandora.nla.gov.au/tep/23206>
Community concerns
International Year of Volunteers < http://pandora.nla.gov.au/col/c5040>
Bali Bombing, 12 October, 2002 < http://pandora.nla.gov.au/col/c8200>
A Bill of Rights for the ACT? < http://pandora.nla.gov.au/tep/36559>
Scientific standards and research
Asbestos: Code of Practice for the Safe Removal of Asbestos
< http://pandora.nla.gov.au/tep/34806>
Qualitative Research Journal < http://pandora.nla.gov.au/tep/34053>
Strategic Issues for Australian Gene Technology
<http://pandora.nla.gov.au/tep/31208>
Politics and government
Australia’s Constitutional Convention < http://pandora.nla.gov.au/tep/10482>
1998 Federal Election Campaign < http://pandora.nla.gov.au/col/c4001>
Crikey < http://pandora.nla.gov.au/tep/13027>
Indigenous peoples
Native Title Conference, held Adelaide University, July 2001
< http://pandora.nla.gov.au/tep/32635>
Reconciliation Australia < http://pandora.nla.gov.au/tep/24362>
Zero Tolerance Policing: Its Background and Implications for Aboriginal People
< http://pandora.nla.gov.au/tep/25328>
Sport
Women’s Hockey Australia <http://pandora.nla.gov.au/tep/10785>
Independent Soccer Inquiry < http://pandora.nla.gov.au/tep/34957>
2003 Melbourne Cup Carnival < http://pandora.nla.gov.au/col/c8275>
Variety of online formats
4.2.11 The following are some examples of the variety of formats in PANDORA:
11

Ngapartji's virtual writers in residence <http://pandora.nla.gov.au/tep/10247>. This site
was archived during 1998 and 1999 and includes some early multimedia. It is no longer
available from the publishers site.

Official 1998 Mardi Gras netcast <http://pandora.nla.gov.au/tep/10023>. This site
documents the first net cast of the Mardi Gras in Sydney. Some of the links do not work,
but the videos of the net cast are there. It is no longer available from the publisher’s site.

Bangarra Dance Theatre < http://pandora.nla.gov.au/tep/14134>. This site contains
beautiful video clips of some of the company’s dances.

OnSecure <http://nla.gov.au/nla.arc-39351>. This is a dynamically generated database
site. Note that in the archive it is stored as static pages.
Comparison with other archives
4.2.12 The PANDORA Archive was one of the first archives of web publications to be established
anywhere in the world. From the beginning, the National Library of Australia and its partners
have been more adventurous than others in selecting and finding solutions for archiving
dynamic web sites, as well as static publications. The PANDORA Archive therefore contains
a much wider range of publications and web sites and illustrates a much richer range of
approaches to web publishing and the technologies that publishers use than other archives do.
As the only national archive in the world which actively collects dynamic web sites and
checks that their functionality has been captured, this Archive will provide a record of world
significance for early dynamic web publications and sites.
Influence on other archives
4.2.13 The PANDORA Archive has had a very strong influence on digital archiving world wide, as is
evident from international studies in which it features prominently (see 3.2.17 & 3.2.19).
There are several reasons for this.

The Archive was one of the first in the world to be established;

The Archive is based on sound policy, procedure, and infrastructure development that is welldocumented and available from the PANDORA web site
<http://pandora.nla.gov.au/index.html>;

The Archive is accessible to anyone, anywhere in the world, which, as explained above, is
unusual for copyright reasons.

The National Library of Australia has adopted a more adventurous and inclusive selection
policy, and has been willing to tackle the more difficult dynamic formats. The Archive is
therefore seen as a model for what can be achieved;

The National Library of Australia has developed digital archiving system software, which,
because of the lack of alternative systems anywhere in the world, has excited interest in
agencies that plan to initiate digital archiving programs. The Library is providing access to
the software to other agencies for evaluation purposes;

The Archive is unique because it is built on a collaborative model.

The National Library of Australia has been willing to share the results of its experience and is
active in efforts to collaborate internationally. It has joined and is actively participating in the
International Internet Preservation Consortium.
12
4.2.14 Through its Charter on the Preservation of Digital Heritage11, UNESCO has acknowledged
born-digital heritage available on-line to be part of the world’s cultural heritage and the need
to address the vulnerability of this material to rapid and complete loss. The Charter was
adopted by member states during the 32nd session of the General Conference of UNESCO in
October 2003. UNESCO has supported the Charter with the publication of Guidelines for the
Preservation of Digital Heritage12 to assist member states to develop policies and procedures
on collecting and preserving their heritage in digital formats. The National Library of
Australia was contracted to undertake the consultative process required to formulate and write
these Guidelines because of its international standing, experience and knowledge gained
through the development of the PANDORA Archive. The Archive therefore has already been
of influence and value in world terms.
4.2.19 The National Libraries of Canada, Denmark and Sweden, and the Internet Archive in the
United States commenced archiving at about the same time as the National Library of
Australia, in the years 1995 to 1997. Canada, Denmark and Australia both took a selective
approach to archiving, while Sweden and the Internet Archive took a ‘whole domain’
approach. In the case of the Internet Archive, the ‘whole domain’ is, in theory, the whole of
the Internet.
4.2.15 Each of these archives and the approaches to archiving on which they are based have strengths
and weaknesses.
4.2.16 A selective approach to archiving enables libraries to achieve four important objectives:




Each item in the archive is quality assessed and functional to the fullest extent permitted by
current technical capabilities;
Each item in the archive can be fully catalogued and therefore can become part of the
national bibliography;
Each item in the archive can be made accessible via the Web immediately. In the case of
Australia, all but 144 are accessible now, and most of the remainder will be available within
five years, owing to the fact that permission to make publications available to the public via
the Web has been negotiated with the publishers;
What is in the archive is known and documented. Preservation needs can be analysed, the
risks assessed and preservation strategies formulated.
4.2.17 The chief disadvantage of the selective approach is that libraries are making subjective
judgments about the value of resources and what researchers of the future are likely to find
useful. The way that researchers will want to access, use and apply the potential of the Web is
still developing and the selective approach does not preserve the context for a given
publication or web site. In theory, then, the obvious advantage of the ‘whole of domain’
approach would seem to be that the whole domain is captured at periodic intervals, with
resources able to be seen in their broader context, with links to other documents retained.
4.2.18 In practice this whole domain advantage is flawed. Because whole domain harvests are
demanding in terms of computer time and storage, they are usually run at intervals of at least a
few months. Any publications, regardless of their significance, which come into being and
change or disappear in the interim, are missed. Because of the huge volume of publications
involved, quality control checks cannot be made on more than a very small sample of titles.
11
UNESCO. (2003) Charter on the Preservation of Digital Heritage. URL: <
http://portal.unesco.org/ci/ev.php?URL_ID=13366&URL_DO=DO_TOPIC&URL_SECTION=201&reload=10
67609511>
12
UNESCO. (2003) Guidelines for the Preservation of Digital Heritage. URL: <
http://portal.unesco.org/ci/ev/php?URL_ID=8967&URL_DO=DO_TOPIC&URL_SECTION=201&reload=106
9628134>.
13
The National Library of Australia’s experience would suggest that at least 40 per cent of
harvested titles will be incomplete or defective in some way.
4.2.19 Commercial sites that employ passwords or other inhibitors to access will not be accessible to
harvesting robots and therefore will not be gathered. Databases and other dynamically driven
sites will also be absent from a whole domain archive.
4.2.20 Because copyright and legal deposit law has been slow to respond to the new conditions
pertaining to the online environment, providing access to archived publications is
problematical, unless permission is negotiated with publishers. This is impossible when large
volumes of sites are being archived. None of the national libraries engaged in whole domain
harvesting provide networked access to their archives. If access is provided at all, it is limited
to a single PC in the library’s reading room.
4.2.21 The Internet Archive13 is a not-for-profit initiative of Brewster Kahle, the founder of Alexa, a
company that has been indexing the World Wide Web since 1996. Alexa donates data to the
Internet Archive. The aim is to preserve as much of the Web as possible by copying web
pages and storing them. It holds billions of web pages. Access is provided to this huge
archive via the Wayback machine on the Internet Archive site.
4.2.22 Although it is not possible to quantify it at this time, the Internet Archive contains a lot of
Australian publications and web sites. It is likely that it contains more Australian material
than the PANDORA Archive does and as such is a valuable resource for researchers who are
looking for online publications that are no longer available from the publishers’ sites.
However, it is subject to all of the disadvantages of a comprehensive or ‘whole domain’
archive, as outlined in paragraphs 4.2.23-24. The Internet Archive gets around the usual
copyright problem by gathering sites without permission and offering to take them down if
copyright owners request them to do so.
4.2.23 In contrast to the Australian content in the Internet Archive, the PANDORA Archive is a
finely honed national collection of significant publications and web sites developed by
specialists in collection building with attention paid to the completeness and functionality of
titles. Access, as explained elsewhere, is provided through traditional library catalogues, as
well as through mechanisms such as browse lists and search engines.
4.2.24 The National Library of Australia and the National Library and Archives of Canada have very
similar archiving programs in some ways. Both follow the selective approach, both catalogue
the archived titles, and both archives are freely available to anyone in the world, except for a
small proportion of restricted titles. The Canadian Archive is much larger than PANDORA,
but is largely limited to static, text-based documents, predominantly government and
commercial monographs and serials.
4.2.25 The archive of the National Library of Denmark was built as a result of legal deposit
legislation which required the deposit of static online publications. This archive is accessible
on a limited basis from a single PC in each of the National Library and the university library.
4.3
4.3.1
13
Criteria of (a) time (b) place (c) people (d) subject and theme (e) form and style.
Time
The Archive documents the early years of publication on the Internet. It contains not only
selected publications of individual and substantial research value that appear in no other
format, but also Web sites that illustrate how Australians were using this new method of
communication from 1996 onwards to promulgate their views and share their experiences. It
The Internet Archive site is available at http://www.archive.org/index.php
14
documents events, crises and social change, for example, through collections of sites about the
Sydney Olympics, the Bali bombing, and the Republican debate respectively.
4.3.2
4.3.3
Place
To be selected for inclusion in the Archive, a publication or web site should be about
Australia, or be on a subject of significance and relevance to Australia and be written by an
Australian author, or be written by an Australian of recognised authority and constitute a
contribution to international knowledge. Its focus is therefore Australia and the Australian
people. However, Australians participate in world affairs and therefore there are also sites of
world and regional interest, for example the Sydney 2000: Official Site of the Sydney 2000
Olympic Games <http://pandora.nla.gov.au/tep/10194> and INTERFET Peace Keeping:
International Force East Timor < http://pandora.nla.gov.au/tep/10661>.
People
As well as publications of individual and substantial research value, an important component
of the Archive is formed by Web sites that collectively document Australian society and
people. There are the sites of well known as well as ‘ordinary’ Australians:
Ian Thorpe (sportsman) <http://pandora.nla.gov.au/tep/10846>;
Pauline Hanson (politician) <http://pandora.nla.gov.au/tep/33908>;
Kate Ceberano (musician) <http://pandora.nla.gov.au/tep/10463>;
Albert Chapman (mineralogist) < http://pandora.nla.gov.au/tep/37671>;
Bob Buick (Vietnam veteran) < http://pandora.nla.gov.au/tep/10085>;
Nancy Crick (euthanasia campaigner) < http://pandora.nla.gov.au/tep/24513>; Trishan
Ponnamperuma (11 year old) <http://pandora.nla.gov.au/tep/15005>.
4.3.4
There are sites via which Australians express their views
(Australians for fairer tax)<http://pandora/nla.gov.au/tep/10120>;
define their experiences
(Bushfires, Canberra, ACT, Jan. 2003) <http://pandora.nla.gov.au/col/c8075>; celebrate their
achievements
(It’s an Honour) <http://pandora.nla.gov.au/tep/37444>;
and their love of sport
(Sports – Australian Internet Sites) <http://pandora.nla.gov.au/col/c4010>.
4.3.5
Particular attention has been paid to archiving the Web sites of Australian ethnic communities
and the Official Page of the Polish Community in Australia <http://pandora.nla.gov.au/tep/
32080> is one of the most heavily used sites in the Archive. The publisher’s site is no longer
actively maintained: it points to the archived version. This is just one of approximately 130
archived web sites of communities originating from Europe, Asia, Africa and North and South
America, which document the experience of migrating to and living in Australia.
4.3.6
4.3.7
Subject and theme
Each of the PANDORA partners selects titles for the Archive according to selection guidelines
that are published on the PANDORA Web site. The National and State libraries archive those
publications and web sites relating to the published output of their jurisdictions. This means
that a wide range of subject matter is archived. In addition, the Archive receives the benefit of
subject specialists. ScreenSound Australia takes responsibility for sites relating to music and
film, the Australian War Memorial archives sites relating to Australian military history, and
the Australian Institute of Aboriginal and Torres Strait Islander Studies adds sites relating to
Australia’s Indigenous peoples.
In addition to the publications of individual substantial research value, the National Library
actively collects sites on a wide range of subjects, events and topical issues that document
Australian life as comprehensively as possible. Details of the National Library’s program for
15
collecting of this category of material are available in Appendix 2B of Online Australian
Publications: Selection Guidelines for Archiving and Preservation by the National Library of
Australia <http://pandora.nla.gov.au/selectionguidelines.html>.
4.3.8
4.3.9
4.4
4.4.1
In addition, the Library actively collects sites documenting events and topical issues as they
arise, for example, Federal elections, Australian participation in the Iraq war, refugees.
Form and style
Internationally the Archive is recognised as a key exemplar of a selective national digital
archive. Reasons for this have already been referred to under 4.2.17. The Library of Congress
stated publicly at the International Web Archiving Symposium in Tokyo in January 2002 that
it had modelled the user interface for its MINERVA Archive on PANDORA because it could
not determine a better way to do it.
Issues of rarity, integrity, threat and management
Rarity
PANDORA is the unique and successful response of a national library to the need to collect
and preserve a nation’s significant online publications and web sites in the incunabula period
of this new mode of communication and publication, the World Wide Web. It is a response
that was formulated when no other solutions for archiving the full range of formats being
published on the Web had been pioneered. As explained in Section 4.2, it has become a model
for a number of other national libraries and research organisations establishing web archiving
programs.
4.4.2
The individual exploratory phase of national libraries seeking solutions to web archiving is
coming to an end. Only a handful took up the challenge in the mid to late 1990s, the early
days of the Web, including the National Libraries of Australia, Canada, Sweden, Norway,
Denmark and Finland. The Library of Congress, the National Diet Library of Japan and some
European national libraries followed a little later. National libraries are now recognise the
benefits of collaboration and are working to develop common infrastructure, tools and
methodologies, which has led to the establishment of the International Internet Preservation
Consortium, of which the National Library of Australia is an active participant. This means
that within the next couple of years it is likely that there will be a homogeneous and wellcoordinated international response to the need to collect and preserve online heritage, which is
very much needed.
4.4.3
As web archiving moves on and reaches new stages of sophistication, it is important to
recognise and record the early examples which influenced development in the field, of which
PANDORA was a leader.
4.4.4
It is estimated that, for about six per cent of titles in the Archive, the publisher’s site has
ceased to exist altogether on the Internet. It is almost certain that for many of these titles the
copy in the PANDORA Archive is the only extant complete version. For many more titles the
publisher’s site continues to exist but has changed, with earlier content being available only in
PANDORA. As time passes, an increasing proportion of titles will no longer be available at
all.
4.4.5
In the case of publications published in print, a number of libraries would collect any given
title, and if a particular library missed acquiring a copy, there would still be a second chance
through the second hand market. There is no such second chance for online publications.
4.4.6
PANDORA partners are collecting online publications collaboratively, one copy for the whole
of Australia. Apart from Our Digital Island <http://odi.statelibrary.tas.gov.au>, the archive at
the State Library of Tasmania that collects Tasmanian publications, there is no other archive
for Australian publications. A few universities have begun to establish e-print archives but
16
these are still a long way from adequately covering publications of the tertiary education
sector.
4.4.7
Integrity
The National Library has a number of policies and procedures in place to protect the integrity
of the Archive and its contents.
4.4.8
At the title level, in copying a publication or web site into the Archive, the policy is to
maintain its ‘look and feel’, that is, its presentation, content and functionality, to the fullest
extent possible. In most cases this is achievable. In some cases, however, because of the
technical set up of a site or because of the limitations of current harvesting technology, it is not
possible to exactly reproduce a site. Some functionality may be missing, for instance, the
ability to conduct a search of an archive of back issues on the web site of an e-journal.
Functions that involve interaction with publishers’ sites have to be disabled.
4.4.9
Where it is not possible to replicate exactly the ‘look and feel’ and functionality of a title with
current technology, the Library takes a pragmatic view and argues that it is better to
compromise on these aspects for the benefit of being able to preserve the intellectual content.
It is essential that the intellectual content be preserved exactly as produced by the publisher.
All sites are quality checked and are accurate copies of the publishers’ sites at time of
archiving. Date of archiving of each instance is collected by the system and is clearly
indicated for each instance14.
4.4.10 Preservation activity to keep titles accessible as hardware and software changes may also
necessitate some changes to ‘look and feel’ and functionality. Once again, preservation of the
intellectual content exactly as produced by the publisher will be the first concern.
4.4.11 At least three copies of each instance of a title are kept:

The preservation master, which is derived from the raw output of harvested files and
associated work logs before human or machine interaction occurs. It is therefore an exact
copy of what is downloaded from the publisher’s site.

The access master, which is the set of harvested files that have undergone quality assurance
procedures (modification of links, addition of missing images, etc.) to make the instance
suitable for public display.

The access copy, which is served to users of the Archive.
4.4.12 The preservation and access masters are stored separately from the access copy on the
Library’s secure Digital Object Storage System (DOSS). Backup tapes of both the DOSS and
the PANDORA Archive server are maintained, including copies that are stored off-site.
4.4.13 These practices contribute to ensuring the integrity of the Archive in the following ways:


Should the Archive fail or be destroyed or corrupted, back up copies would enable the
restoration of the Archive;
Should a hacker succeed in tampering with one of the copies of an instance, it is highly
unlikely that s/he would succeed in tampering with all copies in the different storage spaces
and on all backup tapes, and the instance could be restored from the intact copies;
An ‘instance’ is a single gathering of a title. It includes the gathering of a monograph that has been archived
once only, the first gathering of a serial title or integrating title (for example a Web site that changes over time),
and all subsequent gatherings.
14
17

Should quality control or preservation activity change an instance to an unacceptable degree,
or alter the intellectual content at all, the preservation master can be returned to for an exact
replication of the title on the publisher’s site.
4.4.14 Another aspect of integrity is persistence. Each item in PANDORA, from title level right
down through instances and parts of instances to component files, has a unique persistent
identifier automatically assigned by the digital archiving system. This enables authors to cite
works and parts of works in the Archive using the appropriate persistent identifier. Readers
can return to the cited item in the Archive again and again, confident that it will remain there
persistently and that it will remain the same.
Threat
4.4.15 The Library intends to provide perpetual access to the PANDORA Archive. It is a collection
of Internet publications, many of which are complex digital objects, comprising a number of
different files types. This poses a significant challenge from the point of view of preservation,
as the software and hardware required to display them changes relatively quickly.
Preservation strategies for the Archive are outlined in the section 6, Management Plan, and
the National Library’s Digital Preservation Policy is attached at Appendix 1.
4.4.16 The Library has recently completed a risk assessment of the Archive. One of the significant
risks is the cost of digital preservation and the possibility that the Library may not have the
funds to carry out the necessary preservation action.
4.4.17 ‘Threat’ has another aspect in relation to the PANDORA Archive – not a threat to the material
in the Archive but the threat to significant material that cannot be brought into the Archive
because of insufficient resources to do so. Archiving and managing online publications is
labour-intensive and therefore costly.
4.4.18 Despite their best efforts, the PANDORA partners are managing to archive only a small
proportion of what ideally should be contained in the Archive. The National Library
continues to research improved methods of archiving to increase the volume that can be
handled by the available staff. It is currently investigating methods for automating
identification, selection, description, archiving and quality control of Commonwealth
government publications, which would greatly improve the rate of intake.
4.4.19 There are legal, political, social, organisational and financial factors which place items in the
Archive, as well as those not yet archived, at risk. There is still a low level of recognition
among creators of information, publishers (including commercial publishers), government
agencies and politicians of the importance of keeping online information accessible for future
generations. The absence of legal deposit legislation at the Commonwealth level is a major
impediment to archiving and retention of publications. None of the agencies contributing to
PANDORA are adequately funded to do the work and none, therefore, are able to archive to
the full extent of their selection guidelines.
4.4.20 Listing on the Memory of the World Register would demonstrate the social and cultural
importance of this part of our documentary heritage. It would lift the profile of the Archive,
draw public attention to the importance of keeping our online heritage accessible, and may
make obtaining funds for both collection building and preservation more possible.
5
5.1.
LEGAL INFORMATION
Owner of the documentary heritage (name and contact details)
The National Library of Australia is the owner of PANDORA and its technical infrastructure.
Contact details
Pam Gatenby
18
Assistant Director General
Collections Management
Address
National Library of Australia
Parkes Place
ACT 2600
Telephone 02 6262 1672
Fax 02 6273 2545
Email pgatenby@nla.gov.au
5.2
Custodian of the documentary heritage (name and contact details, if different to owner)
The National Library of Australia is the custodian of the PANDORA and its technical
infrastructure.
Contact details are the same as for owner.
5.3
5.3.1
5.3.2
Legal status:
(a) Category of ownership
Public institution
(b) Accessibility
Titles in the PANDORA Archive are accessible via the Internet at
<http://pandora.nla.gov.au/index.html>. Most titles are freely available, to anyone anywhere
in the world with access to the Web.
5.3.3
Access is restricted for 144 titles, usually because the title is still commercially viable and it is
necessary to protect the publishers’ revenue for a period of time that is negotiated with the
publisher. Examples include:
Safety at Work http://pandora.nla.gov.au/tep/14079;
Federal Law Review http://pandora.nla.gov.au/tep/11156; and
Justinian: E-news http://pandora.nla.gov.au/tep/10397 .
Restrictions currently range in duration from three months to 99 years, with 103 being
restricted for less than five years. Thirty titles are restricted because of sensitive content and
may only be viewed after successful application to the National Library for a log in and
password.
5.3.4
People can find out about titles that are in the Archive by searching the National Bibliographic
Database (NBD) and partners’ local catalogues. Access is provided via hotlinks in the
catalogue record. Access is also available via subject and title lists on the PANDORA home
page, and a search engine that indexes the Archive. Search engines, such as Google and
Yahoo, index the Archive down to the level of individual titles, but not the contents of the
titles.
5.3.5
The policy of providing full catalogue records in the NBD for each title in PANDORA means
that online publications and web sites become part of the national bibliography, as
recommended by the International Federation of Library Associations15. The Archive is one
of few in the world to include routinely records for online publications in the national
bibliography.
15
International Federation of Library Associations (IFLA). The final recommendations of the International
Conference on National Bibliographic Services, 1998. URL: http://www.ifla.org/VI/3/icnbs/fina.htm.
Consulted 3 May 2004
19
5.3.6
5.3.7
5.3.8
5.3.9
(c) Copyright status
Copyright in titles held in PANDORA remains with the original copyright owner. Where the
publisher provides a copyright statement within the publication or web site, a link to it is
provided from within PANDORA. General information about copyright is also provided.
Given the lack of legal deposit legislation for online publications, partners negotiate with
publishers for permission to copy their publications into the Archive, to provide access to
them, and to make copies for the purposes of preservation.
(d) Responsible administration
The National Library of Australia administers PANDORA. Having established the Archive
and developed the technical infrastructure to support acquisition, description, storage,
management, preservation and access, the Library invited other collecting agencies to
contribute to it (the partner organisations are listed in footnote 1). Nine other partners
contribute publications and web sites to the Archive using PANDAS, the web-based software,
developed by the National Library. The Archive is stored centrally at the National Library.
Partners participate in administration of the Archive through the PANDORA Consultative
Committee, which discusses policy and strategy, and a committee of operational staff that
discusses matters of day-to-day archiving. Each committee has an associated discussion list
and teleconferences are held to discuss matters of business.
(e) Other factors: For example, is any institution required by law to preserve the documentary
heritage in this nomination?
5.3.10 The National Library of Australia is required to collect a comprehensive collection of
Australian documentary material under the National Library Act 1960. Each of the other
partners is required by the legislation under which it is established to develop collections of
documentary heritage relevant to their jurisdictions.
5.3.11 Legal deposit provisions assist the National and State libraries to carry out their mandates for
print publications. At the Federal level, however, the legal deposit provisions of the Copyright
ACT 1968 do not cover electronic publications, so the National Library and its partners
negotiate permission to archive publications and web sites with each publisher.
5.3.12 A licence between the Commonwealth of Australia through the Department of
Communications, Information Technology and the Arts and the National Library permits the
Library to archive Commonwealth copyright material published on specified Commonwealth
government web sites.
5.3.13 The State library partners are governed also by State legislation dealing with legal deposit.
None of them has legislation that unambiguously supports the deposit of online publications.
5.3.14 The State Library of New South Wales is aided by the Premier’s Department Memorandum
No. 2000 – 15, Access to Published Information – Laws, Policy and Guidelines, which
mandates the deposit of copies of all government publications, in every format, with
designated institutions, including the State Library.
5.3.15 In Western Australia online government publications are required to be deposited with the
State Library under a Premier’s Circular.
6
MANAGEMENT PLAN
6.1
The management plan is made up of the following documents, which assist the safe and
effective management of PANDORA for preservation and access:
20

Online Australian Publications: Selection Guidelines for Archiving and Preservation by
the National Library of Australia. <http://pandora.nla.gov.au/selectionguidelines.html>

A Digital Preservation Policy for the National Library of Australia
<http://www.nla.gov.au/policy/digpres.html> which includes the PANDORA Archive.
This document is attached at Appendix 1.
6.1.2
To date there are no completely dependable preservation strategies available for online
formats. However, the Library takes a very pro-active approach to digital preservation. It
keeps abreast of international research and development in the area. It, in fact, has
contributed to this research, for example, in the areas of preservation metadata and
migration of html files.
6.1.3
To preserve access to titles in the PANDORA Archive the Library will employ:





some technology preservation, including maintenance of software and even some
hardware;
negotiating with publishers to supply stable source files of some streaming or dynamic
formats;
migration strategies for those file formats which correspond to compatible new formats
and which are amenable to mass conversion;
use of emulators, if they can be found or developed, for some file formats; and
simply keeping and refreshing some files, not amenable to migration or emulation, in the
hope that a suitable access pathway will emerge.
6.1.4
The Library has conducted a risk assessment of its digital collections, with particular focus on
PANDORA. This assessment identifies in detail the risks involved in specific file types that
make up the complex digital objects which comprise the PANDORA Archive and
recommends actions that will be taken inimize those risks.
6.1.5
As detailed in section 4.1 and 4.4, the Library keeps at least three copies of each ’instance’ in
the Archive and stores them separately. This practice is part of the overall preservation
strategy, ensuring that the Archive or parts of it could be restored in the event of loss or
damage.
7
CONSULTATION
7.1
The National Library of Australia is both the owner and the custodian of PANDORA. It has
consulted the PANDORA partners, who were unanimously in favour of PANDORA being nominated
for the Memory of the World Register.
7.1.2
PANDORA has been nominated for inscription on the Australian Memory of the World
Register and the Australian Committee recommended that it be nominated for the World
Register.
PART B – SUBSIDIARY INFORMATION
8
ASSESSMENT OF RISK
8.1
As yet there are no known completely reliable strategies for digital preservation, and this
places the content of the PANDORA Archive at some risk. However, as detailed under sections 4.1
and 4.4 above, this risk is being assessed, documented and actively managed. One of the big threats to
the Archive is the cost of archiving and digital preservation and the fact that no additional funds have
been made available to any of the contributing partners for archiving and preservation work. Listing
21
on the World Register would draw attention to the importance of this type of documentary heritage in
Australia and may increase the likelihood of funding by government.
8.1.2
The National Library is actively involved in research in aspects of preserving digital objects
and keeps abreast of research being carried out by others around the world.
9
ASSESSMENT OF PRESERVATION
9.1
The PANDORA Archive is subject to the same rigorous management and preservation
strategies as all of the National Library’s other digital collections.
Careful documentation and collection control
9.1.2
All titles archived in PANDORA are catalogued and a record goes into partner agencies’ own
catalogues and into the National Bibliographic Database. In addition, the PANDORA Digital
Archiving System (PANDAS) is used to record other information about the title, including
contact with publishers, date/s of archiving, any special matters relating to the title,
restrictions, and some preservation metadata. Correspondence with publishers, for instance,
that relating to obtaining permission to archive, is kept on files in partners’ records
management systems.
9.1.3
The Archive is supported by PANDAS, which was built by the National Library to
specifically to aid in the management, control and preservation of the collection. One function
of PANDAS is to allocate a persistent identifier to every title registered in the system. This
assists with the unique identification of titles and persistent access to them.
Storage environments
9.1.4
As explained in sections 4.1 and 4.4, two copies of each instance archived in PANDORA are
stored in the Library’s Digital Object Storage System (DOSS), under optimal conditions for
security and preservation. Regular back-ups are made in line with standard practice, and
copies or back-ups are stored off-site.
“Prevention is better than cure’
9.1.5
It is impossible to prevent the greatest risk to digital collections, that is, the changing hardware
and software on which they are dependent. However, the National Library endeavours to be
as prepared as possible for this eventuality. It has conducted a risk assessment of its digital
collections, with special emphasis on PANDORA, which is the most complex. PANDAS has
been designed to collect some preservation metadata and is currently being enhanced to collect
more and make it more accessible. As explained above the Library takes other action to
mitigate possible disasters such as loss or corruption of data.
Conserving an original document
9.1.6
In the online world it is not the original document that is being preserved, but a copy. In
making this copy, the PANDORA partners pay particular attention to ensuring the intellectual
content is preserved. In additional, it is our policy to maintain the ‘look and feel’ for as long
as possible. This may become difficult in some cases as software and hardware changes and
we have to resort to preservation strategies such as migration. Keeping preservation metadata
is an important part of this process.
Content migration or reformatting
22
9.1.7
This digital archive will inevitably have to resort to migration or reformatting in order to
maintain long-term access to its contents. The National Library is aware of the risks
associated with loss of information and the closing off of future options. It will do its utmost
to mitigate these risks. One of its strategies as mentioned in section 4.1 is to keep a copy of
the publication or web site as it has been downloaded from the publisher’s site, without any
change. If in future it is found that a preservation strategy or series of them has led to closing
off of future options, then we can always go back to this original copy and begin again with
later technology.
Putting long-term preservation at risk in order to satisfy short-term access demand
9.1.8
This is not relevant to digital materials, as long as preservation copies are kept, as is the case
for items PANDORA.
One size doesn’t fit all
9.1.9
This is of particular relevance to a collection such as PANDORA, where even a single title
often consists of many different file types. The hardware and software required to display the
different file types may become obsolete at different times and require individual intervention.
This makes preservation of the contents of PANDORA a complex and expensive activity. The
National Library has taken a significant step towards dealing with this by conducting a
detailed risk assessment on all categories of formats contained in the Archive, which will lead
to the formulation of preservation strategies for each one.
Cooperation is essential
9.1.10 As already explained in the nomination, the National Library cooperates with nine partners to
build the PANDORA Archive. It has also coordinated the development of UNESCO’s
Guidelines for the Preservation of Digital Heritage, which were designed to assist countries to
develop policies and procedures on collecting and preserving digital heritage. In addition, the
National Library is a foundation member of the International Internet Preservation
Consortium, which is a consortium of national libraries working together to share knowledge,
costs and development effort.
Traditional knowledge
9.1.11 Australian Indigenous peoples have been quick to used the Internet to record their culture and
strengthen their identity. PANDORA already contains 197 sites by or about Indigenous
culture, and the newest PANDORA partner, the Australian Institute of Aboriginal and Torres
Strait Islander Studies, will add to this collection through its specialised knowledge of and
contacts in this sector.
Standard of professionalism
9.1.12 The National Library is committed to active development of its staff and supports them also to
make opportunities to enhance their skills and professional abilities. It provides training and
ongoing support to the staff of PANDORA partners who are engaged in this web archiving
program. It also passes on to the staff of many other institutions in Australia and around the
world the knowledge and expertise that it has developed in web archiving. This is done
through documentation of its policies, procedures and guidelines on its web site, hosting visits
from agencies who wish to learn about our practices, the UNESCO guidelines mentioned
above in 4.2.18, making available the PANDAS software for evaluation by other agencies, and
by organising the Archiving Web Resources International Conference in November 2004.
23
PART C – LODGEMENT
This nomination is lodged by:
(Please print name) Janice Lillian Fullerton
(Signature)
(Date) 30 June 2004
24
Appendix 1
A Digital Preservation Policy for the National Library of Australia
Purpose
This policy statement indicates the directions the National Library of Australia intends to take in
preserving its own digital collections, and in collaborating with others to enable the preservation of
other digital information resources likely to be of value to NLA users.
The objectives of the Library’s digital preservation activities
The National Library’s preservation role is guided by its key objective to preserve and maintain all
Australian and significant non-Australian library materials to ensure they are available for current and
future use. This objective applies to both digital and non-digital information resources, although the
Library recognises that it will use different methods and draw on different skills, procedures and
partnerships, for managing digital and non-digital collections.
The National Library also seeks to help others preserve the Australian information resources for which
they accept responsibility.
The nature of the Library’s digital collections
The National Library began acquiring digital information resources of enduring value in the mid1980s or earlier. By mid-2001 the digital information resources for which the Library accepts some
level of preservation responsibility included:
Australian online resources originally published on the Internet and selected for inclusion in
the Library’s PANDORA archive ;
Australian digital publications originally published on physical carriers such as diskettes and
CD-ROMs ;
Digital audio collections created as part of the Library’s Oral History collections ;
Preservation master digital copies of analogue material from the Library’s collections (and in
some cases, from the collections of other institutions) resulting from digitisation programs ;
Computer files on physical carriers such as diskettes forming part of various Manuscript
collections ;
Original digital images forming part of various Pictorial collections ;
Digital mapping products chosen for long-term retention by the Maps Collection ;
Computer files containing transcripts of the Library’s Oral History collections ;
The Library’s corporate records in digital form ;
Components of the Library’s website chosen for ongoing retention ;
Digital materials from regional partners where the Library accepts a preservation role ;
Metadata records of information resources.
Over time, this list of resource types can be expected to change in response to new business needs and
opportunities such as domain-wide web harvesting, archiving of email discussion lists, and new forms
of electronic communication.
The Library maintains various other policy documents and guidelines relevant to the way digital
resources are created, selected, acquired, described and accessed, which can be found at
http://www.nla.gov.au/policy/.
25
The challenges of keeping digital information resources accessible
The Library defines digital preservation as the processes involved in maintaining, and if necessary,
recovering accessibility to digital information resources.
Ongoing accessibility of digital resources is threatened by many challenges. The challenges include:
The volume and growth in the amount of material to be maintained ;
Widespread use of relatively unstable media ;
Rapid changes in the availability of hardware, software and other technology required for
access ;
The diverse and frequently changing range of file formats and standards ;
Uncertainty about the significant properties that must be maintained for different digital
resources ;
The recurrent, critical and demanding nature of the threats to accessibility ;
Uncertainty about strategies and techniques for addressing the threats ;
The high costs of taking action ;
Administrative complexities in ensuring timely and cost-effective action is taken over very
long periods of time ;
New models of ‘ownership’ that impost property and other rights-based constraints ;
The as-yet ambiguous intentions and capabilities of a range of agents with a potential impact
on accessibility.
Broad directions for the Library’s digital collections
Scope
The Library intends to preserve all the digital materials covered by this policy. However, it is likely
that the Library will be need to allocate priorities for action, based on the relative significance of
particular materials and the technical complexity of preserving access to them.
Models
The Library believes its digital archiving and preservation objectives will be best achieved by
developing practices that comply with an adequate, coherent and widely understood framework for
reliable, accountable and manageable digital archives.
The Library will use the broad understandings and concepts embodied in the Open Archival
Information Systems (OAIS) Reference Model, when released as an agreed ISO Standard, as a
conceptual check for its own archive construction and management. In developing systems and
infrastructure to manage its digital collections, the Library will operate within the principles of reliable
digital repositories as defined by relevant international standards and best practices.
The Library also recognises the value of practical experiments in developing useful new approaches.
The Library will continue to develop process models addressing its particular business needs, clearly
identifying and articulating points at which its practice does not comply with the OAIS Reference
Model, and why.
Preserving accessibility
A key concept of the OAIS model is that of the “archival information package”, consisting of the
digital object itself and all the information required to understand, present, and manage it.
26
In preserving the accessibility of digital resources, the Library will attend to three aspects of
accessibility:
Maintaining the archival information package : the byte-stream which constitutes the digital
object and the information needed to present it as a meaningful reproduction of the originally
presented digital object ;
Maintaining means of accessing an acceptable presentation of the digital object ; and
Maintaining the ability to locate the digital object reliably.
Implementation - NLA digital collections
In implementing this policy with regard to its own collections, the Library will:
Monitor the preservation implications of any systems designed or put in place to manage its
digital collections ;
Apply priorities for preservation in an accountable manner in accord with publicly available
guidelines ;
Store and manage digital resources in ways that secure the integrity of the byte-stream,
including appropriate automated checking, archiving and back-up regimes with high levels
of redundancy and security, and with best practice disaster preparedness and recovery
procedures ;
Document its collections, including file formats, their software and hardware dependencies,
and any features needing to be accommodated ;
Define the significant properties (such as formatting/ “look and feel”, functionality, and
information content) that need to be preserved for particular classes of resources ;
Record preservation metadata that facilitate effective and efficient management, and enable
the significant properties defined for specific resources to be re-presented ;
Understand and document the threats to which the accessibility of its collections will be
exposed ;
Develop and monitor indicators of threats, aiming to understand when preservation action is
needed with sufficient lead time to allow effective, manageable action to be taken ;
Declare its preservation intentions responsibly, realistically, and explicitly, and regularly
report on its performance, including classes of resources it is unable to maintain or to which
it cannot currently provide access ;
Develop and apply appropriate accessibility pathways for specific collections and formats.
These pathways must:





provide access to the defined significant properties;
maintain evidence of authenticity ;
comply with intellectual property rights and with other legal and moral rights
related to copying, storage, modification and use of specific resources ;
be cost-effective ; and
support ongoing access and preservation over time.
At this stage, the Library believes it will need to use a range of approaches to maintain access to all the
digital resources in its care. It assumes that format migration will be an effective pathway for large
numbers of files in ubiquitous open standard formats such as image and audio files generated by the
Library's own digitisation programs. On the other hand, pathways such as emulation of hardware or
software, software archiving, and viewer migration will probably be needed for file formats for which
there is no safe migration path.
27
Research
Current knowledge is not adequate to address these policy directions satisfactorily; needs will
also change over time. Therefore, the Library will continue to encourage and undertake research
into areas in which it requires more knowledge to support its decision-making processes. Such
areas include:
The practicalities of storage, media refreshing and data transfer ;
Appropriate levels and kinds of duplication and redundancy of storage ;
Access pathways including migration of data between file formats; emulation of hardware
and software platforms; maintenance of access technologies including access to required
software; and data recovery ;
Appropriate levels of diversity in managing digital collections ;
Options for dealing with resource types currently not well represented in the Library's
collections, such as email messages or database-structured resources ;
Emerging risks and strategies for dealing with them ;
Quality control issues.
Standards development
The Library believes that standards are a vital element in developing reliable archiving and
preservation procedures. The Library will continue to support standards development in Australia and
internationally where it appears to offer direct or indirect support to the Library's business needs.
Working with others to preserve the nation's digital information resources
The National Library is one of many players with an interest in ensuring the national documentary
heritage is preserved and accessible. Others with such an interest may include State, Territory and
University libraries; some public libraries; national, state and organisational archival agencies;
museums; creators, publishers, and re-users of resources; information users; government, and the
community generally.
The Library seeks to work with others who are taking, or could take, responsibility for preserving
components of Australia's digital information resources. In working with such partners, the Library
wishes to:
Identify appropriate partners and stakeholders able to contribute to a national effort ;
Establish agreements on responsibilities and roles ;
Pursue explicit, mutually-accountable agreements that provide a reliable basis for ongoing
accessibility over long periods ;
Help identify, develop and promote policies, procedures and tools to support such an aim
Work with creators, publishers and re-users of digital content to encourage practices that will
enable, rather than hinder, preservation ;
Work with governments to develop legislative and funding frameworks that will enable costeffective preservation.
Working with others to foster digital preservation
The Library places great importance in working with others in Australia and internationally to develop
practices, strategies and understandings in support of digital preservation. This is evidenced in the
Library’s commitment to the PADI service, an international subject gateway on digital preservation.
The Library seeks to:
28
Freely share information on its approaches, policies, systems, strategies, tools and
experiences ;
Learn from the experience of others as well as its own experience ;
Maintain and improve mechanisms for comparing approaches and strategies, and for
reviewing developments, through the PADI website. The Library will continue to seek a
level of buy-in to PADI that will enhance its role as an internationally useful subject
gateway ;
Participate in research and development projects where there is an identifiable benefit to the
Library and to others involved in preserving Australia's digital information resources.
Contact
The Director of the National Library’s Preservation Services Branch is responsible for maintenance
and implementation of this Digital Preservation Policy.
17 July 2001; revised 24 February 2002
29
Download