MEMORY OF THE WORLD REGISTER PANDORA, Australia’s Web Archive REF N° 2004-28 PART A – ESSENTIAL INFORMATION 1 1.1 SUMMARY Nature of the nomination PANDORA, Australia’s Web Archive, is a collection of copies of significant Australian online publications and web sites issued on the Internet. The National Library of Australia and its partners1 are building the Archive to ensure long-term access to significant Australian documentary heritage that is published online. 1.2 The Archive is stored and managed by the Library and can be searched and accessed via the Internet. It is currently growing at the rate of about 2,000 new titles per year and the titles that are added to it are selected on the basis of stringent selection criteria. 1.3 PANDORA is the first example in the world of a publicly accessible archive of web resources, based on well-established principles of collection development and access. 1.4 PANDORA is proposed for the Memory of the World Register to highlight that information in digital formats is as important as any other to our cultural and documentary history and needs to be preserved. A significant proportion of the world’s documentary heritage is now published only online. If it is not collected and managed, it will be lost to future generations. 2 DETAILS OF THE NOMINATOR 2.1 Name (person or organisation) Janice Lillian Fullerton Director General National Library of Australia 2.2 Relationship to the documentary heritage nominated The nominator is the Director General of the National Library of Australia, which is the owner and custodian of the Archive. 2.3 Contact person (s) Pam Gatenby Assistant Director General Collections Management 2.4 Contact details (include address, phone, fax, email) Address National Library of Australia Parkes Place 1 The PANDORA partners are the National Library of Australia, the Northern Territory Library and Information Service, the State Library of Queensland, the State Library of New South Wales, the State Library of Victoria, the State Library of South Australia, the State Library of Western Australia, ScreenSound Australia, the Australian War Memorial, and the Australian Institute of Aboriginal and Torres Strait Islander Studies. 1 ACT 2600 Telephone 02 6262 1672 Fax 02 6273 2545 Email pgatenby@nla.gov.au 3 3.1 IDENTITY AND DESCRIPTION OF THE DOCUMENTARY HERITAGE Name and identification details of the items being nominated PANDORA, Australia’s Web Archive http://pandora.nla.gov.au/index.html The owner and custodian of PANDORA is the National Library of Australia. The documentary heritage is located within the Library at Parkes Place Canberra ACT 2600 Australia 3.2 Description and inventory, including cataloguing/guide or similar access information 3.2.1 PANDORA, Australia’s Web Archive, is accessible at http://pandora.nla.gov.au/index.html. A full browsable list of the 6,000 titles currently in the Archive is available on this home page. All titles contained in the Archive are individually catalogued according to library conventions in the National Library’s online catalogue, as well as in the National Bibliographic Database (NBD). 3.2.2 The National Library considers the nominated heritage to be the contents of the PANDORA Archive, not the policy documents and technical infrastructure, though they have played a significant role in determining the type of archive it is and its scope and evolution. 3.2.3 The National Library of Australia is committed to preserving the contents of the Archive in perpetuity. 3.2.4 It is not possible to preserve the technical infrastructure in operational form. The Library already keeps an historic register of the source code. It also keeps a preservation copy of the display versions of each title and instance2 so that the Archive could be reconstituted in its present form if ever required in the future. Should PANDORA be accepted for inclusion on the World Register, the National Library would be happy to document the existing contents of the Archive and its interface by undertaking to do the following : Keep a snapshot of the interface ; Copy the PANDAS3 database on a specified date of relevance to this nomination, so that titles and instances that are part of the Archive at this time will be recorded ; Keep copies of the selection guidelines that have shaped the Archive available for consultation in the future. An ‘instance’ is a single gathering of a title. It includes the gathering of a monograph that will be archived only once, the first gathering of a serial title or integrating title (for example a web site that changes over time), and all subsequent gatherings. 3 PANDAS is the PANDORA Digital Archive System developed by the National Library of Australia to support the collection, management, provision of access to and preservation of online publications and web sites. 2 2 Bibliographic and registration details 3.2.5 PANDORA : Australia’s Web Archive [electronic resource] / National Library of Australia and partners. Canberra : National Library of Australia, 1996 -. PANDORA is an online collection of significant Australian publications and web sites, containing 6,024 titles as of 26 May 2004. The Archive is accessible at http://pandora.nla.gov.au/index.html Summary of its provenance (for example, how and when was the material acquired and integrated into the holdings of the institution) 3.2.6 In 1995 the National Library identified the issue of the growing amount of Australian information published in online format only as a matter needing attention. The Library accepted that it had responsibility to collect and preserve Australian publications, regardless of format. 3.2.7 In 1996 the Library began to develop selection guidelines for this category of material. The Australian Electronic Unit was established to select online publications according to the guidelines, to negotiate with publishers for the right to archive them, and to catalogue them onto the National Bibliographic Database. 3.2.8 Work proceeded on two levels at the same time - developing policy and theoretical models for the work and undertaking practical experiments in web archiving, storage and access using freely available software. The first two titles were downloaded in October 1996. By June 1997 the Archive contained 31 titles. 3.2.9 By 1998 policy, procedures and infrastructure were sufficiently developed to invite the State libraries to become partners and in August the State Library of Victoria became the first partner. By March 2004 five State libraries, the Northern Territory Library and Information Service, ScreenSound Australia: the National Screen and Sound Archive, the Australian War Memorial, and the Australian Institute for Aboriginal and Torres Strait Islander Studies had become partners, ten contributing agencies in all. 3.2.10 To support the acquisition and management of increasing volumes of data, as well as to support more efficient distributed archive building among partners, the Library developed the PANDORA Digital Archiving System (PANDAS), the first release of which took place in June 2001, with version 2 being released in August 2002. Further development of the software has commenced in the first quarter of 2004. 3.2.11 The National Library and its partners select online publications for archiving; register them in PANDAS; negotiate a copyright licence with publishers (including the right to copy publications and web sites into the Archive and provide access in perpetuity to them); carry out quality assessment to ensure that all the functionality of a web publication has been captured in the archived copy; and create a catalogue record for them for inclusion in the National Bibliographic Database and the local catalogues of library partners. 3.2.12 The Archive is held centrally at the National Library, which takes responsibility for maintaining the Archive, backing it up according to standard Information Technology management practices, and taking preservation action over time, as required. 3 Analysis or assessment of physical state and condition, such as description of storage arrangements, conservation diagnosis, etc. 3.2.13 PANDORA is a collection of computer files, which constitute copies of selected online publications and web sites, which are issued on the Internet. A title in the Archive may consist of a single file, such as a text document in Portable Document Format (PDF), for example, Annual Report to the NSW Environment Protection Agency < http://pandora.nla.gov.au/tep/42658>. Or it may be a complex web object, such as a large web site, consisting of thousands of files in a variety of formats, such as text, sound, image or video, for example, the Arafura Games <http://nla.gov.au/nla.arc-14228>. 3.2.14 PANDORA contributors preserve the ‘look and feel’ (appearance and functionality) of a publication or web site, as well as its contents, to the greatest extent possible. With the publisher’s permission, a harvesting robot is sent to the publisher’s site to harvest the publication or web site and bring a copy of it back to working space within PANDAS. Staff of the National Library and its partners then check this copy for completeness and functionality before consigning it to the Archive for public access. 3.2.15 At least three copies of each archived item are made, one for display and two for preservation purposes. The display copy is stored on a dedicated server at the National Library. The two preservation copies are stored in the Library’s Digital Object Storage System, where all the Library’s digital collections are stored in optimum conditions of security and to facilitate appropriate preservation activity as required in the future. Visual documentation 3.2.16 The live site is available at <http://pandora.nla.gov.au/index.html>. Bibliography 3.2.17 Beagrie, Neil. National Digital Preservation Initiatives: An Overview of Developments in Australia, France, the Netherlands and the United Kingdom and of Related International Activity. Washington, D.C.: Council on Library and Information Resources and Library of Congress, 2003. <http://www.clir.org/pubs/reports/pub116/pub116.pdf> This report provides an overview of selected national and multinational initiatives in digital preservation occurring outside North America, including PANDORA. 3.2.18 Cathro, Warwick, Colin Webb, and Julie Whiting, Archiving the Web: The PANDORA Archive at the National Library of Australia. Canberra: National Library of Australia, 2001 <http://www.nla.gov.au/nla/staffpaper/2001/cathro3.html> 3.2.19 Day, Michael. Collecting and Preserving the World Wide Web: A Feasibility Study Undertaken for the JISC and Wellcome Trust. Bath: UKOLN, University of Bath, 2003. <http://www.jisc.ac.uk/uploaded_documents/archiving_feasibility.pdf> This study examines PANDORA in the context of digital archiving world wide and recommends that the Wellcome Trust establish an archive using the PANDORA methodology and the PANDAS software for this archiving activity. Referees 3.2.20 Professor Ross Harvey Library and Information Management 4 School of Information Studies Charles Sturt University Locked Bag 675 Wagga Wagga NSW 2678 Phone: +61 2 6933 2369 Email: rossharvey@csu.edu.au Professor Harvey has a long-standing interest in preservation management. Since 1988 he has written over 25 articles and 4 books and has accepted numerous invitations to contribute to workshops and conferences in this field. He is currently working on a book entitled Preserving Digital Objects: An Australian Perspective, and in preparation for this spent some time at the National Library investigating its digital preservation program and PANDORA. Professor Harvey is leading a group developing a register of lost and missing documentary heritage for the Australian progam of UNESCO's Memory of the World Program, and is presenting papers on preservation at the ALIA (Australian Library and Information Association) and LIANZA (Library and Information Association of New Zealand Aotearoa) conferences in September 2004. 3.2.21 Mr Neil Beagrie British Library-JISC Partnership Manager British Library 96 Euston Road London NW1 2DB Phone: + 44 709 204 8179 Email: Neil.Beagrie@bl.uk Neil Beagrie has been extensively involved in the field of digital preservation in various positions he has held in the U.K., including assistant director of the Arts and Humanities Data Service (AHDS) and program director for digital preservation in the Joint Information Systems Committee (JISC). He was co-author of Preservation Management of Digital Materials, a study published by the British Library in 2001, and was the author of National Digital Preservation Initiatives: An Overview of Developments in Australia, France, the Netherlands, and the United Kingdom and of Related International Activity, a study published by the Library of Congress in 2003. The latter study investigated the National Library's PANDORA Archive and set it in the context of other key digital archiving programs taking place world wide. He now works at the British Library as British Library-JISC Partnership Manager. 3.2.22 Laura Campbell Associate Librarian for Strategic Initiatives Library of Congress 101 Independence Ave SE Washington DC 20540 USA Phone: +1 202 707 7849 Email: lcam@loc.gov Ms. Campbell joined the Library of Congress in 1992 as Director of Library Distribution Services. She is currently Director of the National Digital Library Program (NDL). In this position, Ms. Campbell managed the innovative American Memory Program, a cooperative effort to digitize and make available online significant items pertaining to American culture and history. Since 2000, Ms. Campbell has also held the position of Associate Librarian for Strategic Initiatives. As such, she is responsible for the development of the National Digital Information Infrastructure and Preservation Program (NDIIP) in collaboration with other cultural and heritage institutions. NDIIP is a plan approved and funded by Congress to 5 establish a strategy for the Library of Congress in collaboration with other federal and nonfederal entities, toInvestigations into the technical aspects of digital preservation have comprised a major component of the project thus far, and eventually, the NDIIP will make recommendations to Congress about the best options for a long-term national preservation strategy. Ms Campbell is therefore very well aware of the importance of such work, the requirements for trusted national archives, and the challenges involved in implementing them. 4 JUSTIFICATION FOR INCLUSION/ ASSESSMENT AGAINST CRITERIA Authenticity Yes. Authenticity of a digital archive is quite a complex matter and involves a number of factors. Are the individual items within the archive faithful copies of the original and does the archive as a whole have integrity? Authenticity of a digital archive depends on effective, ongoing management strategies to ensure that the archive and individual items in it are not accidentally or deliberately changed. Policies and procedures for authenticity at the title level 4.1.2 The National Library has a number of policies and procedures in place to keep the PANDORA Archive authentic, especially its intellectual content. 4.1.3 Substantial effort is invested in ensuring the authenticity and integrity of each individual title. This attention is possible with selective archives and is one of their major advantages. In contrast, whole domain archives, such as the Internet Archive or the archive of the National Library of Sweden, collect such a large volume of material that quality assurance cannot take place. A significant proportion of titles in whole domain archives at the present time are therefore incomplete in content or lacking in functionality.4 4.1.4 In copying a publication or Web site into the Archive, the policy of the PANDORA partners is to maintain its ‘look and feel’, that is, its appearance and functionality, as well as contents, to the fullest extent possible. In most cases this is achievable. In some cases, however, because of the technical set up of a site or because of the limitations of current harvesting technology, it is not possible to exactly reproduce a site. Some functionality may be missing, for instance, the ability to conduct a search of an archive of back issues on the web site of an e-journal. Functions that involve interaction with publishers’ sites have to be disabled. 4.1.5 The Library is constantly working to overcome technical limitations that prevent us from archiving some publications or parts of them. A current research project to archive and provide access to publications structured as databases is a good case in point. 4.1.6 Where it is not possible to replicate exactly the ‘look and feel’ and functionality of a title with current technology, the Library takes a pragmatic view and argues that it is better to compromise on these aspects for the benefit of being able to preserve the intellectual content. It is essential that the intellectual content be preserved exactly as produced by the publisher. All sites are quality checked and are accurate copies of the publishers’ sites at time of archiving. Date of archiving of each instance is collected by the system and is clearly indicated. 4 For further information about the respective advantages and disadvantages of selective and whole domain harvesting, see section ...[comparison with other archives] 6 4.1.7 Preservation activity to keep titles accessible as hardware and software changes may also necessitate some changes to ‘look and feel’ and functionality. Once again, preservation of the intellectual content exactly as produced by the publisher will be of paramount concern. Policies and procedures for authenticity at the Archive level 4.1.8 At the Archive level, a number of policies and practices are in place to preserve authenticity. 4.1.9 At least three copies of each instance of a title are kept: The preservation master, which is derived from the raw output of harvested files and associated work logs before human or machine interactions occurs. It is therefore an exact copy of what is downloaded from the publisher’s site and will be kept in perpetuity. The access master, which is the set of harvested files that have undergone quality assurance procedures (modification of links, addition of missing images, etc.) to make the instance suitable for public display. The access master will also be kept in perpetuity. The access copy, which is served to users of the Archive. 4.1.10 The preservation and access masters are stored separately from the access copy on the Library’s secure Digital Object Storage System (DOSS). Backup tapes of both the DOSS and the PANDORA Archive server are maintained, including copies that are stored off-site. 4.1.11 These practices contribute to the authenticity of the Archive in the following ways: Should the Archive fail or be destroyed or corrupted, back up copies would enable the restoration of the Archive; Should a hacker succeed in tampering with one of the copies of an instance (for explanation of ‘instance’ see footnote 2), it is highly unlikely that s/he would succeed in tampering with all copies on all backup tapes, and the instance could be restored from the intact copies; Should quality control or preservation activity change an instance to an unacceptable degree, or alter the intellectual content at all, the preservation master can be returned to for an exact replication of the title as it was on the publisher’s site. Persistent identifiers 4.1.12 Another aspect of authenticity is persistence. Each item in the Archive, from title level right down though instances and parts of instances to component files, has a unique persistent identifier automatically assigned by the digital archiving system. This enables authors to cite works and parts of works in the Archive using the appropriate persistent identifier. Readers can return to the cited item in the Archive again and again, confident that it will remain there persistently and that it will be the same. World significance, uniqueness and irreplaceability Uniqueness 4.2.1 The PANDORA Archive is unique. It is recognised by the library and archives sector as the leader of its type (a selective archive) in the world. It is the only archive to be built collaboratively. The two overseas studies cited in the bibliography (see paragraphs 3.2.17 & 3.2.19) indicate its importance in world terms and place it in context. 4.2.1.1 PANDORA is the first example in the world of a publicly accessible archive of web resources, based on well-established principles of collection development and access, that has the following characteristics: 7 It is selective and developed according to clearly defined and published selection guidelines. It has been more adventurous and innovative than other web archives in its approach to collecting a wide range of formats and includes both static and dynamic5 publications and web sites. It therefore represents a wide range of publication types and formats employed by publishers and creators on the Web. Each title that is added to the Archive is quality assessed to ensure completeness of intellectual content and functionality. Each title in the Archive is catalogued, with a record in the National Library’s and other partners’ online catalogues, as well as in the National Bibliographic Database6. It therefore conforms to the IFLA recommendation7 that online resources be included in the national bibliography. It provides networked access to researchers world-wide. Up until now it has been the only web archive managed by a cultural heritage institution that is built collaboratively with other partners. Dynamic as well as static publications 4.2.1.2 PANDORA is unique. It is a selective archive that includes dynamic publications and web sites as well as static publications. Other national archives, such as those of the National Libraries of Canada, Denmark, and Japan, have been set up to collect online static publications, that is, documents that are accessed by hypertext links only. 4.2.2 Irreplaceable. For an estimated six per cent of the titles in the Archive the publisher’s site no longer exists on the live Web and the title is therefore irreplaceable. As time passes, this percentage will increase rapidly. 4.2.2.1 This does not, however, represent the full extent of the irreplaceable material in the Archive. While live publishers’ sites remain for most of the titles in the Archive, many of these sites have changed over time and PANDORA has captured this changing content through repeat gatherings. While it is not possible to quantify the amount of material in this category which is now unique to PANDORA, it is estimated to be a significant amount. Examples include: Fineart Forum: Art + Technology Net News <http://pandora.nla.gov.au/tep/11009>. This e-zine commenced publication on the Internet in 1987 before the creation of the World Wide Web. From 1987 to 1995 its issues are in very plain text with no images. From 1996, however, it started to take advantage of the design and navigation features offered A ‘dynamic’ web resource is defined as ‘ A web document that is created from a database in real-time or "on the fly" at the same time it is being viewed, providing a continuous flow of new information and giving visitors a new experience each time they visit the web site. This definition is given in www.about-the-web-.com: An Internet Guide for Newcomers to the World Wide Web. It is available online at <about-theweb.com/shtml/glossary.shtml>. Consulted 30 June 2004 5 6 The National Bibliographic Database is a union catalogue of records of over 850 Australian libraries, access to which is provided by the Kinetica service. It is available online at <http://www.nla.gov.au/kinetica/> . Consulted 12 March 2004. 7 International Federation of Library Associations (IFLA). The final recommendations of the International Conference on National Bibliographic Services, 1998. Available online at <http://www.ifla.org/VI/3/icnbs/fina.htm>. Consulted 3 May 2004. 8 by HTML8. While it remains a static publication, it frequently changes design and the Archive records the evolution of its design, as well as preserving its intellectual content for posterity. See Appendices 4 & 5 for for images of early and late versions of this publication. Worlds of Sara Douglass <http://pandora.nla.gov.au/tep/10349> The site was archived every year from 1998 to 2002, when it changed on a regular basis. Prime Minister of Australia, John Howard <http://pandora.nla.gov.au/tep/10052>. Repeated harvests of this site document change from 1998 to 2004 in the web presence of a major political figure. 4.2.2.2 PANDORA documents the early years of publication on the Australian Internet. In doing so it also documents the early years of the world Internet. Because it was the second national archive to be established, because it is the only selective archive to actively collect dynamic publications and web sites, and because it undertakes quality assurance of archived titles to make sure they work as they are supposed to, it is likely that PANDORA is the only repository for some types of early online publications and web sites that have now disappeared from the Web. 4.2.3 Significance PANDORA is significant in world terms. It contains Web sites and publications chosen because of their social, political, cultural, religious, scientific and economic significance, an increasing number of which are no longer available anywhere else. 4.2.3.1 For example, Sydney 2000: Official Site of the Sydney 2000 Olympic Games <http://pandora.nla.gov.au/tep/10194> is one of the most heavily used titles in the Archive and is becoming even more popular as the 2004 Olympics approach. It disappeared from the live site soon after the close of the Games in 2000 and it was the first web site for an Olympic games to have been captured. This site was captured a number of times to record change leading up to the Games and then every day while the Games were underway. 4.2.3.1 PANDORA is historically significant because : It includes publications and web sites that once existed on the Internet but have now disappeared. The only way now to refer to these is via the PANDORA Archive. In addition, it includes significant publications that are only in online form and which will in time disappear from the publishers’ web sites. PANDORA was established in 19969, just three years after the creation of the World Wide Web in 1993. It therefore provides a record of the early years of this new and revolutionary publication and communication medium, which is freely available to researchers. Its objective is to maintain access in perpetuity to resources in it by taking appropriate preservation action. It is not just for short-term access purposes. It recognises that online or digital publishing forms an important part of a nation’s heritage. ‘HTML’ stands for HyperText Markup Language and is the major language of the Internet’s World Wide Web. Web sites and web pages are written in HTML. This definition is given in HTML : An Interactive Tutorial for Beginners. It is available online at <http://www.davesite.com/webstation/html>. Consulted 30 June 2004. 9 Experimental archiving for PANDORA commenced in 1996, with regular archiving established by mid 1997. The Archive contains files published electronically as early as 1987. 8 9 PANDORA has been influential in the development of web archiving. It demonstrates that it is possible, with firmly based business models and objectives that govern operations, for national libraries and other cultural heritage institutions to treat web resources in a way that is consistent with their remit to collect, document and provide access to cultural heritage. 4.2.3.2 PANDORA has aesthetic significance because it preserves the appearance and functionality (the ‘look and feel’) of publications and web sites, as well as their intellectual content, and the evolution in presentation and format of items mounted on the Web. 4.2.3.3 PANDORA has social and spiritual significance. It includes web sites created by many different groups and communities to express identity, communicate views, beliefs and concerns, and to celebrate their contribution to the broader society. These include web sites by and for indigenous peoples and multi-cultural communities from around the world. PANDORA as a model for other archives 4.2.4 PANDORA is the first of its kind and other national and institutional archives are emulating it. For instance, the Library of Congress has modelled the user interface for its MINERVA Archive on PANDORA. 4.2.5 A feasibility study by the Wellcome Trust, in seeking a solution for the collection and preservation of web sites of relevance to medical research, recommended that the Trust ‘Establish a pilot medical Web archiving project using the selective approach as pioneered by the National Library of Australia…The pilot should consider using the NLA’s PANDAS software for this archiving activity. This pilot could be run independently or as part of a wider collaborative project with other partners.’10 4.2.6 The Wellcome Trust has now joined with other collecting institutions in the United Kingdom, including the British Library, the National Archives, and the Scottish and Welsh National Libraries, to form the UK Web Archiving Consortium. This Consortium is planning a selective archive modelled in part on PANDORA and has contracted with the National Library of Australia to use the PANDAS software for its web archiving program. Other national libraries have also evaluated the software, or plan to, with a view to developing archives based on similar principles using the software. Online ‘visits’ to PANDORA 4.2.7 The majority of online visitors to PANDORA come from countries other than Australia. In May 2004, of a total of 161,299 visits to the Archive, only 49, 240 were from Australia. Apart from 30,209 visits from ‘region unspecified’, the remainder (81,850) were from elsewhere in the world, including North America, Europe, Asia and the Pacific Islands. As titles in the Archive inevitably disappear from their live sites, the value of the Archive to researchers will increase. Scope of PANDORA 10 Day, Michael. Collecting and Preserving the World Wide Web: A Feasibility Study Undertaken for the JISC and Wellcome Trust. Bath: UKOLN, University of Bath, 2003. Page 3. Available online at <http://www.jisc.ac.uk/uploaded_documents/archiving_feasibility.pdf> 10 4.2.8 Each of the PANDORA partners selects titles for the Archive according to stringent selection guidelines that are published on the PANDORA web site. The National and State libraries archive those publications and web sites relating to the published output of their jurisdictions. ScreenSound Australia takes responsibility for sites relating to music and film; the Australian War Memorial archives sites relating to Australian military history; and the newest partner, the Australian Institute for Aboriginal and Torres Strait Islander Studies archives the publications and web sites of our Indigenous peoples. The National Library of Australia’s selection guidelines are available at http://pandora.nla.gov.au/selectionguidelines.html. 4.2.9 The PANDORA Archive contains publications and web sites carefully selected for their significance and long-term research value. It comprises an estimated less than one per cent of the Australian web domain. 4.2.10 It contains a wide range of publications and Web sites. High priority is placed on collecting government publications and academic e-journals. In addition there are many other types of sites. The following are just a few examples in a few categories: Cultural activity Bangarra Dance Theatre < http://pandora.nla.gov.au/tep/14134> Sydney Film Festival < http://pandora.nla.gov.au/tep/25307> Australian Girls’ Choir < http://pandora.nla.gov.au/tep/23206> Community concerns International Year of Volunteers < http://pandora.nla.gov.au/col/c5040> Bali Bombing, 12 October, 2002 < http://pandora.nla.gov.au/col/c8200> A Bill of Rights for the ACT? < http://pandora.nla.gov.au/tep/36559> Scientific standards and research Asbestos: Code of Practice for the Safe Removal of Asbestos < http://pandora.nla.gov.au/tep/34806> Qualitative Research Journal < http://pandora.nla.gov.au/tep/34053> Strategic Issues for Australian Gene Technology <http://pandora.nla.gov.au/tep/31208> Politics and government Australia’s Constitutional Convention < http://pandora.nla.gov.au/tep/10482> 1998 Federal Election Campaign < http://pandora.nla.gov.au/col/c4001> Crikey < http://pandora.nla.gov.au/tep/13027> Indigenous peoples Native Title Conference, held Adelaide University, July 2001 < http://pandora.nla.gov.au/tep/32635> Reconciliation Australia < http://pandora.nla.gov.au/tep/24362> Zero Tolerance Policing: Its Background and Implications for Aboriginal People < http://pandora.nla.gov.au/tep/25328> Sport Women’s Hockey Australia <http://pandora.nla.gov.au/tep/10785> Independent Soccer Inquiry < http://pandora.nla.gov.au/tep/34957> 2003 Melbourne Cup Carnival < http://pandora.nla.gov.au/col/c8275> Variety of online formats 4.2.11 The following are some examples of the variety of formats in PANDORA: 11 Ngapartji's virtual writers in residence <http://pandora.nla.gov.au/tep/10247>. This site was archived during 1998 and 1999 and includes some early multimedia. It is no longer available from the publishers site. Official 1998 Mardi Gras netcast <http://pandora.nla.gov.au/tep/10023>. This site documents the first net cast of the Mardi Gras in Sydney. Some of the links do not work, but the videos of the net cast are there. It is no longer available from the publisher’s site. Bangarra Dance Theatre < http://pandora.nla.gov.au/tep/14134>. This site contains beautiful video clips of some of the company’s dances. OnSecure <http://nla.gov.au/nla.arc-39351>. This is a dynamically generated database site. Note that in the archive it is stored as static pages. Comparison with other archives 4.2.12 The PANDORA Archive was one of the first archives of web publications to be established anywhere in the world. From the beginning, the National Library of Australia and its partners have been more adventurous than others in selecting and finding solutions for archiving dynamic web sites, as well as static publications. The PANDORA Archive therefore contains a much wider range of publications and web sites and illustrates a much richer range of approaches to web publishing and the technologies that publishers use than other archives do. As the only national archive in the world which actively collects dynamic web sites and checks that their functionality has been captured, this Archive will provide a record of world significance for early dynamic web publications and sites. Influence on other archives 4.2.13 The PANDORA Archive has had a very strong influence on digital archiving world wide, as is evident from international studies in which it features prominently (see 3.2.17 & 3.2.19). There are several reasons for this. The Archive was one of the first in the world to be established; The Archive is based on sound policy, procedure, and infrastructure development that is welldocumented and available from the PANDORA web site <http://pandora.nla.gov.au/index.html>; The Archive is accessible to anyone, anywhere in the world, which, as explained above, is unusual for copyright reasons. The National Library of Australia has adopted a more adventurous and inclusive selection policy, and has been willing to tackle the more difficult dynamic formats. The Archive is therefore seen as a model for what can be achieved; The National Library of Australia has developed digital archiving system software, which, because of the lack of alternative systems anywhere in the world, has excited interest in agencies that plan to initiate digital archiving programs. The Library is providing access to the software to other agencies for evaluation purposes; The Archive is unique because it is built on a collaborative model. The National Library of Australia has been willing to share the results of its experience and is active in efforts to collaborate internationally. It has joined and is actively participating in the International Internet Preservation Consortium. 12 4.2.14 Through its Charter on the Preservation of Digital Heritage11, UNESCO has acknowledged born-digital heritage available on-line to be part of the world’s cultural heritage and the need to address the vulnerability of this material to rapid and complete loss. The Charter was adopted by member states during the 32nd session of the General Conference of UNESCO in October 2003. UNESCO has supported the Charter with the publication of Guidelines for the Preservation of Digital Heritage12 to assist member states to develop policies and procedures on collecting and preserving their heritage in digital formats. The National Library of Australia was contracted to undertake the consultative process required to formulate and write these Guidelines because of its international standing, experience and knowledge gained through the development of the PANDORA Archive. The Archive therefore has already been of influence and value in world terms. 4.2.19 The National Libraries of Canada, Denmark and Sweden, and the Internet Archive in the United States commenced archiving at about the same time as the National Library of Australia, in the years 1995 to 1997. Canada, Denmark and Australia both took a selective approach to archiving, while Sweden and the Internet Archive took a ‘whole domain’ approach. In the case of the Internet Archive, the ‘whole domain’ is, in theory, the whole of the Internet. 4.2.15 Each of these archives and the approaches to archiving on which they are based have strengths and weaknesses. 4.2.16 A selective approach to archiving enables libraries to achieve four important objectives: Each item in the archive is quality assessed and functional to the fullest extent permitted by current technical capabilities; Each item in the archive can be fully catalogued and therefore can become part of the national bibliography; Each item in the archive can be made accessible via the Web immediately. In the case of Australia, all but 144 are accessible now, and most of the remainder will be available within five years, owing to the fact that permission to make publications available to the public via the Web has been negotiated with the publishers; What is in the archive is known and documented. Preservation needs can be analysed, the risks assessed and preservation strategies formulated. 4.2.17 The chief disadvantage of the selective approach is that libraries are making subjective judgments about the value of resources and what researchers of the future are likely to find useful. The way that researchers will want to access, use and apply the potential of the Web is still developing and the selective approach does not preserve the context for a given publication or web site. In theory, then, the obvious advantage of the ‘whole of domain’ approach would seem to be that the whole domain is captured at periodic intervals, with resources able to be seen in their broader context, with links to other documents retained. 4.2.18 In practice this whole domain advantage is flawed. Because whole domain harvests are demanding in terms of computer time and storage, they are usually run at intervals of at least a few months. Any publications, regardless of their significance, which come into being and change or disappear in the interim, are missed. Because of the huge volume of publications involved, quality control checks cannot be made on more than a very small sample of titles. 11 UNESCO. (2003) Charter on the Preservation of Digital Heritage. URL: < http://portal.unesco.org/ci/ev.php?URL_ID=13366&URL_DO=DO_TOPIC&URL_SECTION=201&reload=10 67609511> 12 UNESCO. (2003) Guidelines for the Preservation of Digital Heritage. URL: < http://portal.unesco.org/ci/ev/php?URL_ID=8967&URL_DO=DO_TOPIC&URL_SECTION=201&reload=106 9628134>. 13 The National Library of Australia’s experience would suggest that at least 40 per cent of harvested titles will be incomplete or defective in some way. 4.2.19 Commercial sites that employ passwords or other inhibitors to access will not be accessible to harvesting robots and therefore will not be gathered. Databases and other dynamically driven sites will also be absent from a whole domain archive. 4.2.20 Because copyright and legal deposit law has been slow to respond to the new conditions pertaining to the online environment, providing access to archived publications is problematical, unless permission is negotiated with publishers. This is impossible when large volumes of sites are being archived. None of the national libraries engaged in whole domain harvesting provide networked access to their archives. If access is provided at all, it is limited to a single PC in the library’s reading room. 4.2.21 The Internet Archive13 is a not-for-profit initiative of Brewster Kahle, the founder of Alexa, a company that has been indexing the World Wide Web since 1996. Alexa donates data to the Internet Archive. The aim is to preserve as much of the Web as possible by copying web pages and storing them. It holds billions of web pages. Access is provided to this huge archive via the Wayback machine on the Internet Archive site. 4.2.22 Although it is not possible to quantify it at this time, the Internet Archive contains a lot of Australian publications and web sites. It is likely that it contains more Australian material than the PANDORA Archive does and as such is a valuable resource for researchers who are looking for online publications that are no longer available from the publishers’ sites. However, it is subject to all of the disadvantages of a comprehensive or ‘whole domain’ archive, as outlined in paragraphs 4.2.23-24. The Internet Archive gets around the usual copyright problem by gathering sites without permission and offering to take them down if copyright owners request them to do so. 4.2.23 In contrast to the Australian content in the Internet Archive, the PANDORA Archive is a finely honed national collection of significant publications and web sites developed by specialists in collection building with attention paid to the completeness and functionality of titles. Access, as explained elsewhere, is provided through traditional library catalogues, as well as through mechanisms such as browse lists and search engines. 4.2.24 The National Library of Australia and the National Library and Archives of Canada have very similar archiving programs in some ways. Both follow the selective approach, both catalogue the archived titles, and both archives are freely available to anyone in the world, except for a small proportion of restricted titles. The Canadian Archive is much larger than PANDORA, but is largely limited to static, text-based documents, predominantly government and commercial monographs and serials. 4.2.25 The archive of the National Library of Denmark was built as a result of legal deposit legislation which required the deposit of static online publications. This archive is accessible on a limited basis from a single PC in each of the National Library and the university library. 4.3 4.3.1 13 Criteria of (a) time (b) place (c) people (d) subject and theme (e) form and style. Time The Archive documents the early years of publication on the Internet. It contains not only selected publications of individual and substantial research value that appear in no other format, but also Web sites that illustrate how Australians were using this new method of communication from 1996 onwards to promulgate their views and share their experiences. It The Internet Archive site is available at http://www.archive.org/index.php 14 documents events, crises and social change, for example, through collections of sites about the Sydney Olympics, the Bali bombing, and the Republican debate respectively. 4.3.2 4.3.3 Place To be selected for inclusion in the Archive, a publication or web site should be about Australia, or be on a subject of significance and relevance to Australia and be written by an Australian author, or be written by an Australian of recognised authority and constitute a contribution to international knowledge. Its focus is therefore Australia and the Australian people. However, Australians participate in world affairs and therefore there are also sites of world and regional interest, for example the Sydney 2000: Official Site of the Sydney 2000 Olympic Games <http://pandora.nla.gov.au/tep/10194> and INTERFET Peace Keeping: International Force East Timor < http://pandora.nla.gov.au/tep/10661>. People As well as publications of individual and substantial research value, an important component of the Archive is formed by Web sites that collectively document Australian society and people. There are the sites of well known as well as ‘ordinary’ Australians: Ian Thorpe (sportsman) <http://pandora.nla.gov.au/tep/10846>; Pauline Hanson (politician) <http://pandora.nla.gov.au/tep/33908>; Kate Ceberano (musician) <http://pandora.nla.gov.au/tep/10463>; Albert Chapman (mineralogist) < http://pandora.nla.gov.au/tep/37671>; Bob Buick (Vietnam veteran) < http://pandora.nla.gov.au/tep/10085>; Nancy Crick (euthanasia campaigner) < http://pandora.nla.gov.au/tep/24513>; Trishan Ponnamperuma (11 year old) <http://pandora.nla.gov.au/tep/15005>. 4.3.4 There are sites via which Australians express their views (Australians for fairer tax)<http://pandora/nla.gov.au/tep/10120>; define their experiences (Bushfires, Canberra, ACT, Jan. 2003) <http://pandora.nla.gov.au/col/c8075>; celebrate their achievements (It’s an Honour) <http://pandora.nla.gov.au/tep/37444>; and their love of sport (Sports – Australian Internet Sites) <http://pandora.nla.gov.au/col/c4010>. 4.3.5 Particular attention has been paid to archiving the Web sites of Australian ethnic communities and the Official Page of the Polish Community in Australia <http://pandora.nla.gov.au/tep/ 32080> is one of the most heavily used sites in the Archive. The publisher’s site is no longer actively maintained: it points to the archived version. This is just one of approximately 130 archived web sites of communities originating from Europe, Asia, Africa and North and South America, which document the experience of migrating to and living in Australia. 4.3.6 4.3.7 Subject and theme Each of the PANDORA partners selects titles for the Archive according to selection guidelines that are published on the PANDORA Web site. The National and State libraries archive those publications and web sites relating to the published output of their jurisdictions. This means that a wide range of subject matter is archived. In addition, the Archive receives the benefit of subject specialists. ScreenSound Australia takes responsibility for sites relating to music and film, the Australian War Memorial archives sites relating to Australian military history, and the Australian Institute of Aboriginal and Torres Strait Islander Studies adds sites relating to Australia’s Indigenous peoples. In addition to the publications of individual substantial research value, the National Library actively collects sites on a wide range of subjects, events and topical issues that document Australian life as comprehensively as possible. Details of the National Library’s program for 15 collecting of this category of material are available in Appendix 2B of Online Australian Publications: Selection Guidelines for Archiving and Preservation by the National Library of Australia <http://pandora.nla.gov.au/selectionguidelines.html>. 4.3.8 4.3.9 4.4 4.4.1 In addition, the Library actively collects sites documenting events and topical issues as they arise, for example, Federal elections, Australian participation in the Iraq war, refugees. Form and style Internationally the Archive is recognised as a key exemplar of a selective national digital archive. Reasons for this have already been referred to under 4.2.17. The Library of Congress stated publicly at the International Web Archiving Symposium in Tokyo in January 2002 that it had modelled the user interface for its MINERVA Archive on PANDORA because it could not determine a better way to do it. Issues of rarity, integrity, threat and management Rarity PANDORA is the unique and successful response of a national library to the need to collect and preserve a nation’s significant online publications and web sites in the incunabula period of this new mode of communication and publication, the World Wide Web. It is a response that was formulated when no other solutions for archiving the full range of formats being published on the Web had been pioneered. As explained in Section 4.2, it has become a model for a number of other national libraries and research organisations establishing web archiving programs. 4.4.2 The individual exploratory phase of national libraries seeking solutions to web archiving is coming to an end. Only a handful took up the challenge in the mid to late 1990s, the early days of the Web, including the National Libraries of Australia, Canada, Sweden, Norway, Denmark and Finland. The Library of Congress, the National Diet Library of Japan and some European national libraries followed a little later. National libraries are now recognise the benefits of collaboration and are working to develop common infrastructure, tools and methodologies, which has led to the establishment of the International Internet Preservation Consortium, of which the National Library of Australia is an active participant. This means that within the next couple of years it is likely that there will be a homogeneous and wellcoordinated international response to the need to collect and preserve online heritage, which is very much needed. 4.4.3 As web archiving moves on and reaches new stages of sophistication, it is important to recognise and record the early examples which influenced development in the field, of which PANDORA was a leader. 4.4.4 It is estimated that, for about six per cent of titles in the Archive, the publisher’s site has ceased to exist altogether on the Internet. It is almost certain that for many of these titles the copy in the PANDORA Archive is the only extant complete version. For many more titles the publisher’s site continues to exist but has changed, with earlier content being available only in PANDORA. As time passes, an increasing proportion of titles will no longer be available at all. 4.4.5 In the case of publications published in print, a number of libraries would collect any given title, and if a particular library missed acquiring a copy, there would still be a second chance through the second hand market. There is no such second chance for online publications. 4.4.6 PANDORA partners are collecting online publications collaboratively, one copy for the whole of Australia. Apart from Our Digital Island <http://odi.statelibrary.tas.gov.au>, the archive at the State Library of Tasmania that collects Tasmanian publications, there is no other archive for Australian publications. A few universities have begun to establish e-print archives but 16 these are still a long way from adequately covering publications of the tertiary education sector. 4.4.7 Integrity The National Library has a number of policies and procedures in place to protect the integrity of the Archive and its contents. 4.4.8 At the title level, in copying a publication or web site into the Archive, the policy is to maintain its ‘look and feel’, that is, its presentation, content and functionality, to the fullest extent possible. In most cases this is achievable. In some cases, however, because of the technical set up of a site or because of the limitations of current harvesting technology, it is not possible to exactly reproduce a site. Some functionality may be missing, for instance, the ability to conduct a search of an archive of back issues on the web site of an e-journal. Functions that involve interaction with publishers’ sites have to be disabled. 4.4.9 Where it is not possible to replicate exactly the ‘look and feel’ and functionality of a title with current technology, the Library takes a pragmatic view and argues that it is better to compromise on these aspects for the benefit of being able to preserve the intellectual content. It is essential that the intellectual content be preserved exactly as produced by the publisher. All sites are quality checked and are accurate copies of the publishers’ sites at time of archiving. Date of archiving of each instance is collected by the system and is clearly indicated for each instance14. 4.4.10 Preservation activity to keep titles accessible as hardware and software changes may also necessitate some changes to ‘look and feel’ and functionality. Once again, preservation of the intellectual content exactly as produced by the publisher will be the first concern. 4.4.11 At least three copies of each instance of a title are kept: The preservation master, which is derived from the raw output of harvested files and associated work logs before human or machine interaction occurs. It is therefore an exact copy of what is downloaded from the publisher’s site. The access master, which is the set of harvested files that have undergone quality assurance procedures (modification of links, addition of missing images, etc.) to make the instance suitable for public display. The access copy, which is served to users of the Archive. 4.4.12 The preservation and access masters are stored separately from the access copy on the Library’s secure Digital Object Storage System (DOSS). Backup tapes of both the DOSS and the PANDORA Archive server are maintained, including copies that are stored off-site. 4.4.13 These practices contribute to ensuring the integrity of the Archive in the following ways: Should the Archive fail or be destroyed or corrupted, back up copies would enable the restoration of the Archive; Should a hacker succeed in tampering with one of the copies of an instance, it is highly unlikely that s/he would succeed in tampering with all copies in the different storage spaces and on all backup tapes, and the instance could be restored from the intact copies; An ‘instance’ is a single gathering of a title. It includes the gathering of a monograph that has been archived once only, the first gathering of a serial title or integrating title (for example a Web site that changes over time), and all subsequent gatherings. 14 17 Should quality control or preservation activity change an instance to an unacceptable degree, or alter the intellectual content at all, the preservation master can be returned to for an exact replication of the title on the publisher’s site. 4.4.14 Another aspect of integrity is persistence. Each item in PANDORA, from title level right down through instances and parts of instances to component files, has a unique persistent identifier automatically assigned by the digital archiving system. This enables authors to cite works and parts of works in the Archive using the appropriate persistent identifier. Readers can return to the cited item in the Archive again and again, confident that it will remain there persistently and that it will remain the same. Threat 4.4.15 The Library intends to provide perpetual access to the PANDORA Archive. It is a collection of Internet publications, many of which are complex digital objects, comprising a number of different files types. This poses a significant challenge from the point of view of preservation, as the software and hardware required to display them changes relatively quickly. Preservation strategies for the Archive are outlined in the section 6, Management Plan, and the National Library’s Digital Preservation Policy is attached at Appendix 1. 4.4.16 The Library has recently completed a risk assessment of the Archive. One of the significant risks is the cost of digital preservation and the possibility that the Library may not have the funds to carry out the necessary preservation action. 4.4.17 ‘Threat’ has another aspect in relation to the PANDORA Archive – not a threat to the material in the Archive but the threat to significant material that cannot be brought into the Archive because of insufficient resources to do so. Archiving and managing online publications is labour-intensive and therefore costly. 4.4.18 Despite their best efforts, the PANDORA partners are managing to archive only a small proportion of what ideally should be contained in the Archive. The National Library continues to research improved methods of archiving to increase the volume that can be handled by the available staff. It is currently investigating methods for automating identification, selection, description, archiving and quality control of Commonwealth government publications, which would greatly improve the rate of intake. 4.4.19 There are legal, political, social, organisational and financial factors which place items in the Archive, as well as those not yet archived, at risk. There is still a low level of recognition among creators of information, publishers (including commercial publishers), government agencies and politicians of the importance of keeping online information accessible for future generations. The absence of legal deposit legislation at the Commonwealth level is a major impediment to archiving and retention of publications. None of the agencies contributing to PANDORA are adequately funded to do the work and none, therefore, are able to archive to the full extent of their selection guidelines. 4.4.20 Listing on the Memory of the World Register would demonstrate the social and cultural importance of this part of our documentary heritage. It would lift the profile of the Archive, draw public attention to the importance of keeping our online heritage accessible, and may make obtaining funds for both collection building and preservation more possible. 5 5.1. LEGAL INFORMATION Owner of the documentary heritage (name and contact details) The National Library of Australia is the owner of PANDORA and its technical infrastructure. Contact details Pam Gatenby 18 Assistant Director General Collections Management Address National Library of Australia Parkes Place ACT 2600 Telephone 02 6262 1672 Fax 02 6273 2545 Email pgatenby@nla.gov.au 5.2 Custodian of the documentary heritage (name and contact details, if different to owner) The National Library of Australia is the custodian of the PANDORA and its technical infrastructure. Contact details are the same as for owner. 5.3 5.3.1 5.3.2 Legal status: (a) Category of ownership Public institution (b) Accessibility Titles in the PANDORA Archive are accessible via the Internet at <http://pandora.nla.gov.au/index.html>. Most titles are freely available, to anyone anywhere in the world with access to the Web. 5.3.3 Access is restricted for 144 titles, usually because the title is still commercially viable and it is necessary to protect the publishers’ revenue for a period of time that is negotiated with the publisher. Examples include: Safety at Work http://pandora.nla.gov.au/tep/14079; Federal Law Review http://pandora.nla.gov.au/tep/11156; and Justinian: E-news http://pandora.nla.gov.au/tep/10397 . Restrictions currently range in duration from three months to 99 years, with 103 being restricted for less than five years. Thirty titles are restricted because of sensitive content and may only be viewed after successful application to the National Library for a log in and password. 5.3.4 People can find out about titles that are in the Archive by searching the National Bibliographic Database (NBD) and partners’ local catalogues. Access is provided via hotlinks in the catalogue record. Access is also available via subject and title lists on the PANDORA home page, and a search engine that indexes the Archive. Search engines, such as Google and Yahoo, index the Archive down to the level of individual titles, but not the contents of the titles. 5.3.5 The policy of providing full catalogue records in the NBD for each title in PANDORA means that online publications and web sites become part of the national bibliography, as recommended by the International Federation of Library Associations15. The Archive is one of few in the world to include routinely records for online publications in the national bibliography. 15 International Federation of Library Associations (IFLA). The final recommendations of the International Conference on National Bibliographic Services, 1998. URL: http://www.ifla.org/VI/3/icnbs/fina.htm. Consulted 3 May 2004 19 5.3.6 5.3.7 5.3.8 5.3.9 (c) Copyright status Copyright in titles held in PANDORA remains with the original copyright owner. Where the publisher provides a copyright statement within the publication or web site, a link to it is provided from within PANDORA. General information about copyright is also provided. Given the lack of legal deposit legislation for online publications, partners negotiate with publishers for permission to copy their publications into the Archive, to provide access to them, and to make copies for the purposes of preservation. (d) Responsible administration The National Library of Australia administers PANDORA. Having established the Archive and developed the technical infrastructure to support acquisition, description, storage, management, preservation and access, the Library invited other collecting agencies to contribute to it (the partner organisations are listed in footnote 1). Nine other partners contribute publications and web sites to the Archive using PANDAS, the web-based software, developed by the National Library. The Archive is stored centrally at the National Library. Partners participate in administration of the Archive through the PANDORA Consultative Committee, which discusses policy and strategy, and a committee of operational staff that discusses matters of day-to-day archiving. Each committee has an associated discussion list and teleconferences are held to discuss matters of business. (e) Other factors: For example, is any institution required by law to preserve the documentary heritage in this nomination? 5.3.10 The National Library of Australia is required to collect a comprehensive collection of Australian documentary material under the National Library Act 1960. Each of the other partners is required by the legislation under which it is established to develop collections of documentary heritage relevant to their jurisdictions. 5.3.11 Legal deposit provisions assist the National and State libraries to carry out their mandates for print publications. At the Federal level, however, the legal deposit provisions of the Copyright ACT 1968 do not cover electronic publications, so the National Library and its partners negotiate permission to archive publications and web sites with each publisher. 5.3.12 A licence between the Commonwealth of Australia through the Department of Communications, Information Technology and the Arts and the National Library permits the Library to archive Commonwealth copyright material published on specified Commonwealth government web sites. 5.3.13 The State library partners are governed also by State legislation dealing with legal deposit. None of them has legislation that unambiguously supports the deposit of online publications. 5.3.14 The State Library of New South Wales is aided by the Premier’s Department Memorandum No. 2000 – 15, Access to Published Information – Laws, Policy and Guidelines, which mandates the deposit of copies of all government publications, in every format, with designated institutions, including the State Library. 5.3.15 In Western Australia online government publications are required to be deposited with the State Library under a Premier’s Circular. 6 MANAGEMENT PLAN 6.1 The management plan is made up of the following documents, which assist the safe and effective management of PANDORA for preservation and access: 20 Online Australian Publications: Selection Guidelines for Archiving and Preservation by the National Library of Australia. <http://pandora.nla.gov.au/selectionguidelines.html> A Digital Preservation Policy for the National Library of Australia <http://www.nla.gov.au/policy/digpres.html> which includes the PANDORA Archive. This document is attached at Appendix 1. 6.1.2 To date there are no completely dependable preservation strategies available for online formats. However, the Library takes a very pro-active approach to digital preservation. It keeps abreast of international research and development in the area. It, in fact, has contributed to this research, for example, in the areas of preservation metadata and migration of html files. 6.1.3 To preserve access to titles in the PANDORA Archive the Library will employ: some technology preservation, including maintenance of software and even some hardware; negotiating with publishers to supply stable source files of some streaming or dynamic formats; migration strategies for those file formats which correspond to compatible new formats and which are amenable to mass conversion; use of emulators, if they can be found or developed, for some file formats; and simply keeping and refreshing some files, not amenable to migration or emulation, in the hope that a suitable access pathway will emerge. 6.1.4 The Library has conducted a risk assessment of its digital collections, with particular focus on PANDORA. This assessment identifies in detail the risks involved in specific file types that make up the complex digital objects which comprise the PANDORA Archive and recommends actions that will be taken inimize those risks. 6.1.5 As detailed in section 4.1 and 4.4, the Library keeps at least three copies of each ’instance’ in the Archive and stores them separately. This practice is part of the overall preservation strategy, ensuring that the Archive or parts of it could be restored in the event of loss or damage. 7 CONSULTATION 7.1 The National Library of Australia is both the owner and the custodian of PANDORA. It has consulted the PANDORA partners, who were unanimously in favour of PANDORA being nominated for the Memory of the World Register. 7.1.2 PANDORA has been nominated for inscription on the Australian Memory of the World Register and the Australian Committee recommended that it be nominated for the World Register. PART B – SUBSIDIARY INFORMATION 8 ASSESSMENT OF RISK 8.1 As yet there are no known completely reliable strategies for digital preservation, and this places the content of the PANDORA Archive at some risk. However, as detailed under sections 4.1 and 4.4 above, this risk is being assessed, documented and actively managed. One of the big threats to the Archive is the cost of archiving and digital preservation and the fact that no additional funds have been made available to any of the contributing partners for archiving and preservation work. Listing 21 on the World Register would draw attention to the importance of this type of documentary heritage in Australia and may increase the likelihood of funding by government. 8.1.2 The National Library is actively involved in research in aspects of preserving digital objects and keeps abreast of research being carried out by others around the world. 9 ASSESSMENT OF PRESERVATION 9.1 The PANDORA Archive is subject to the same rigorous management and preservation strategies as all of the National Library’s other digital collections. Careful documentation and collection control 9.1.2 All titles archived in PANDORA are catalogued and a record goes into partner agencies’ own catalogues and into the National Bibliographic Database. In addition, the PANDORA Digital Archiving System (PANDAS) is used to record other information about the title, including contact with publishers, date/s of archiving, any special matters relating to the title, restrictions, and some preservation metadata. Correspondence with publishers, for instance, that relating to obtaining permission to archive, is kept on files in partners’ records management systems. 9.1.3 The Archive is supported by PANDAS, which was built by the National Library to specifically to aid in the management, control and preservation of the collection. One function of PANDAS is to allocate a persistent identifier to every title registered in the system. This assists with the unique identification of titles and persistent access to them. Storage environments 9.1.4 As explained in sections 4.1 and 4.4, two copies of each instance archived in PANDORA are stored in the Library’s Digital Object Storage System (DOSS), under optimal conditions for security and preservation. Regular back-ups are made in line with standard practice, and copies or back-ups are stored off-site. “Prevention is better than cure’ 9.1.5 It is impossible to prevent the greatest risk to digital collections, that is, the changing hardware and software on which they are dependent. However, the National Library endeavours to be as prepared as possible for this eventuality. It has conducted a risk assessment of its digital collections, with special emphasis on PANDORA, which is the most complex. PANDAS has been designed to collect some preservation metadata and is currently being enhanced to collect more and make it more accessible. As explained above the Library takes other action to mitigate possible disasters such as loss or corruption of data. Conserving an original document 9.1.6 In the online world it is not the original document that is being preserved, but a copy. In making this copy, the PANDORA partners pay particular attention to ensuring the intellectual content is preserved. In additional, it is our policy to maintain the ‘look and feel’ for as long as possible. This may become difficult in some cases as software and hardware changes and we have to resort to preservation strategies such as migration. Keeping preservation metadata is an important part of this process. Content migration or reformatting 22 9.1.7 This digital archive will inevitably have to resort to migration or reformatting in order to maintain long-term access to its contents. The National Library is aware of the risks associated with loss of information and the closing off of future options. It will do its utmost to mitigate these risks. One of its strategies as mentioned in section 4.1 is to keep a copy of the publication or web site as it has been downloaded from the publisher’s site, without any change. If in future it is found that a preservation strategy or series of them has led to closing off of future options, then we can always go back to this original copy and begin again with later technology. Putting long-term preservation at risk in order to satisfy short-term access demand 9.1.8 This is not relevant to digital materials, as long as preservation copies are kept, as is the case for items PANDORA. One size doesn’t fit all 9.1.9 This is of particular relevance to a collection such as PANDORA, where even a single title often consists of many different file types. The hardware and software required to display the different file types may become obsolete at different times and require individual intervention. This makes preservation of the contents of PANDORA a complex and expensive activity. The National Library has taken a significant step towards dealing with this by conducting a detailed risk assessment on all categories of formats contained in the Archive, which will lead to the formulation of preservation strategies for each one. Cooperation is essential 9.1.10 As already explained in the nomination, the National Library cooperates with nine partners to build the PANDORA Archive. It has also coordinated the development of UNESCO’s Guidelines for the Preservation of Digital Heritage, which were designed to assist countries to develop policies and procedures on collecting and preserving digital heritage. In addition, the National Library is a foundation member of the International Internet Preservation Consortium, which is a consortium of national libraries working together to share knowledge, costs and development effort. Traditional knowledge 9.1.11 Australian Indigenous peoples have been quick to used the Internet to record their culture and strengthen their identity. PANDORA already contains 197 sites by or about Indigenous culture, and the newest PANDORA partner, the Australian Institute of Aboriginal and Torres Strait Islander Studies, will add to this collection through its specialised knowledge of and contacts in this sector. Standard of professionalism 9.1.12 The National Library is committed to active development of its staff and supports them also to make opportunities to enhance their skills and professional abilities. It provides training and ongoing support to the staff of PANDORA partners who are engaged in this web archiving program. It also passes on to the staff of many other institutions in Australia and around the world the knowledge and expertise that it has developed in web archiving. This is done through documentation of its policies, procedures and guidelines on its web site, hosting visits from agencies who wish to learn about our practices, the UNESCO guidelines mentioned above in 4.2.18, making available the PANDAS software for evaluation by other agencies, and by organising the Archiving Web Resources International Conference in November 2004. 23 PART C – LODGEMENT This nomination is lodged by: (Please print name) Janice Lillian Fullerton (Signature) (Date) 30 June 2004 24 Appendix 1 A Digital Preservation Policy for the National Library of Australia Purpose This policy statement indicates the directions the National Library of Australia intends to take in preserving its own digital collections, and in collaborating with others to enable the preservation of other digital information resources likely to be of value to NLA users. The objectives of the Library’s digital preservation activities The National Library’s preservation role is guided by its key objective to preserve and maintain all Australian and significant non-Australian library materials to ensure they are available for current and future use. This objective applies to both digital and non-digital information resources, although the Library recognises that it will use different methods and draw on different skills, procedures and partnerships, for managing digital and non-digital collections. The National Library also seeks to help others preserve the Australian information resources for which they accept responsibility. The nature of the Library’s digital collections The National Library began acquiring digital information resources of enduring value in the mid1980s or earlier. By mid-2001 the digital information resources for which the Library accepts some level of preservation responsibility included: Australian online resources originally published on the Internet and selected for inclusion in the Library’s PANDORA archive ; Australian digital publications originally published on physical carriers such as diskettes and CD-ROMs ; Digital audio collections created as part of the Library’s Oral History collections ; Preservation master digital copies of analogue material from the Library’s collections (and in some cases, from the collections of other institutions) resulting from digitisation programs ; Computer files on physical carriers such as diskettes forming part of various Manuscript collections ; Original digital images forming part of various Pictorial collections ; Digital mapping products chosen for long-term retention by the Maps Collection ; Computer files containing transcripts of the Library’s Oral History collections ; The Library’s corporate records in digital form ; Components of the Library’s website chosen for ongoing retention ; Digital materials from regional partners where the Library accepts a preservation role ; Metadata records of information resources. Over time, this list of resource types can be expected to change in response to new business needs and opportunities such as domain-wide web harvesting, archiving of email discussion lists, and new forms of electronic communication. The Library maintains various other policy documents and guidelines relevant to the way digital resources are created, selected, acquired, described and accessed, which can be found at http://www.nla.gov.au/policy/. 25 The challenges of keeping digital information resources accessible The Library defines digital preservation as the processes involved in maintaining, and if necessary, recovering accessibility to digital information resources. Ongoing accessibility of digital resources is threatened by many challenges. The challenges include: The volume and growth in the amount of material to be maintained ; Widespread use of relatively unstable media ; Rapid changes in the availability of hardware, software and other technology required for access ; The diverse and frequently changing range of file formats and standards ; Uncertainty about the significant properties that must be maintained for different digital resources ; The recurrent, critical and demanding nature of the threats to accessibility ; Uncertainty about strategies and techniques for addressing the threats ; The high costs of taking action ; Administrative complexities in ensuring timely and cost-effective action is taken over very long periods of time ; New models of ‘ownership’ that impost property and other rights-based constraints ; The as-yet ambiguous intentions and capabilities of a range of agents with a potential impact on accessibility. Broad directions for the Library’s digital collections Scope The Library intends to preserve all the digital materials covered by this policy. However, it is likely that the Library will be need to allocate priorities for action, based on the relative significance of particular materials and the technical complexity of preserving access to them. Models The Library believes its digital archiving and preservation objectives will be best achieved by developing practices that comply with an adequate, coherent and widely understood framework for reliable, accountable and manageable digital archives. The Library will use the broad understandings and concepts embodied in the Open Archival Information Systems (OAIS) Reference Model, when released as an agreed ISO Standard, as a conceptual check for its own archive construction and management. In developing systems and infrastructure to manage its digital collections, the Library will operate within the principles of reliable digital repositories as defined by relevant international standards and best practices. The Library also recognises the value of practical experiments in developing useful new approaches. The Library will continue to develop process models addressing its particular business needs, clearly identifying and articulating points at which its practice does not comply with the OAIS Reference Model, and why. Preserving accessibility A key concept of the OAIS model is that of the “archival information package”, consisting of the digital object itself and all the information required to understand, present, and manage it. 26 In preserving the accessibility of digital resources, the Library will attend to three aspects of accessibility: Maintaining the archival information package : the byte-stream which constitutes the digital object and the information needed to present it as a meaningful reproduction of the originally presented digital object ; Maintaining means of accessing an acceptable presentation of the digital object ; and Maintaining the ability to locate the digital object reliably. Implementation - NLA digital collections In implementing this policy with regard to its own collections, the Library will: Monitor the preservation implications of any systems designed or put in place to manage its digital collections ; Apply priorities for preservation in an accountable manner in accord with publicly available guidelines ; Store and manage digital resources in ways that secure the integrity of the byte-stream, including appropriate automated checking, archiving and back-up regimes with high levels of redundancy and security, and with best practice disaster preparedness and recovery procedures ; Document its collections, including file formats, their software and hardware dependencies, and any features needing to be accommodated ; Define the significant properties (such as formatting/ “look and feel”, functionality, and information content) that need to be preserved for particular classes of resources ; Record preservation metadata that facilitate effective and efficient management, and enable the significant properties defined for specific resources to be re-presented ; Understand and document the threats to which the accessibility of its collections will be exposed ; Develop and monitor indicators of threats, aiming to understand when preservation action is needed with sufficient lead time to allow effective, manageable action to be taken ; Declare its preservation intentions responsibly, realistically, and explicitly, and regularly report on its performance, including classes of resources it is unable to maintain or to which it cannot currently provide access ; Develop and apply appropriate accessibility pathways for specific collections and formats. These pathways must: provide access to the defined significant properties; maintain evidence of authenticity ; comply with intellectual property rights and with other legal and moral rights related to copying, storage, modification and use of specific resources ; be cost-effective ; and support ongoing access and preservation over time. At this stage, the Library believes it will need to use a range of approaches to maintain access to all the digital resources in its care. It assumes that format migration will be an effective pathway for large numbers of files in ubiquitous open standard formats such as image and audio files generated by the Library's own digitisation programs. On the other hand, pathways such as emulation of hardware or software, software archiving, and viewer migration will probably be needed for file formats for which there is no safe migration path. 27 Research Current knowledge is not adequate to address these policy directions satisfactorily; needs will also change over time. Therefore, the Library will continue to encourage and undertake research into areas in which it requires more knowledge to support its decision-making processes. Such areas include: The practicalities of storage, media refreshing and data transfer ; Appropriate levels and kinds of duplication and redundancy of storage ; Access pathways including migration of data between file formats; emulation of hardware and software platforms; maintenance of access technologies including access to required software; and data recovery ; Appropriate levels of diversity in managing digital collections ; Options for dealing with resource types currently not well represented in the Library's collections, such as email messages or database-structured resources ; Emerging risks and strategies for dealing with them ; Quality control issues. Standards development The Library believes that standards are a vital element in developing reliable archiving and preservation procedures. The Library will continue to support standards development in Australia and internationally where it appears to offer direct or indirect support to the Library's business needs. Working with others to preserve the nation's digital information resources The National Library is one of many players with an interest in ensuring the national documentary heritage is preserved and accessible. Others with such an interest may include State, Territory and University libraries; some public libraries; national, state and organisational archival agencies; museums; creators, publishers, and re-users of resources; information users; government, and the community generally. The Library seeks to work with others who are taking, or could take, responsibility for preserving components of Australia's digital information resources. In working with such partners, the Library wishes to: Identify appropriate partners and stakeholders able to contribute to a national effort ; Establish agreements on responsibilities and roles ; Pursue explicit, mutually-accountable agreements that provide a reliable basis for ongoing accessibility over long periods ; Help identify, develop and promote policies, procedures and tools to support such an aim Work with creators, publishers and re-users of digital content to encourage practices that will enable, rather than hinder, preservation ; Work with governments to develop legislative and funding frameworks that will enable costeffective preservation. Working with others to foster digital preservation The Library places great importance in working with others in Australia and internationally to develop practices, strategies and understandings in support of digital preservation. This is evidenced in the Library’s commitment to the PADI service, an international subject gateway on digital preservation. The Library seeks to: 28 Freely share information on its approaches, policies, systems, strategies, tools and experiences ; Learn from the experience of others as well as its own experience ; Maintain and improve mechanisms for comparing approaches and strategies, and for reviewing developments, through the PADI website. The Library will continue to seek a level of buy-in to PADI that will enhance its role as an internationally useful subject gateway ; Participate in research and development projects where there is an identifiable benefit to the Library and to others involved in preserving Australia's digital information resources. Contact The Director of the National Library’s Preservation Services Branch is responsible for maintenance and implementation of this Digital Preservation Policy. 17 July 2001; revised 24 February 2002 29